The Only Human Factor
Doctoral Thesis · Teesside University · Awarded June 2024 · Supervised by J. Cordry, J.F. Ferreira, A. Mendes, P.J. Brooke
In Short
Passwords are the most persistent form of digital authentication in existence. Despite decades of predictions that they would be replaced — by biometrics, hardware tokens, passkeys, or something else entirely — they remain the dominant way humans prove their identity to digital systems worldwide. And despite everything we know about how badly people choose them, the software and policies designed to make password-protected systems more secure have largely been built on intuition, convention, and unverified code.
This thesis applies rigorous mathematical and statistical methods to that problem end-to-end: from sourcing and cleaning real-world password data, through using that data to design better password policies, to building formally verified software that enforces those policies on live systems with machine-checkable correctness guarantees. It is, in the broadest sense, an argument that one of the most human problems in security — people choosing bad passwords — deserves, and can be given, the same rigour we apply to the systems around it.
The Breakdown
Problem Human-chosen passwords are predictable. People cluster around common choices, follow predictable patterns when forced to change, and respond to composition rules in ways that often undermine the security those rules were intended to provide. The software enforcing password policies had never been formally verified — meaning bugs could silently invalidate security guarantees — and the process of choosing which policy to enforce was almost universally based on intuition rather than evidence. Meanwhile, the datasets available to study these problems — billions of real-world leaked passwords — came with their own hidden flaws: noise, unknown provenance, and missing context about the policies under which they were created.
Approach
The thesis constructs a complete, end-to-end workflow. It begins with the data problem: developing a statistical technique to infer the password composition policy under which a leaked dataset was created, and using that to clean the data and filter noise — work embodied in the pol-infer tool. It then uses that cleaned data as the foundation for Skeptic: a methodology and toolchain for automatically ranking password composition policies by their expected resistance to guessing attacks, modelling user behaviour probabilistically and quantifying security through distribution uniformity, without ever needing to retain the underlying password data. Finally, it closes the loop with formal verification: using the Coq proof assistant to specify, implement and prove the correctness of password quality checkers, automatically extracting them as deployable Linux PAM modules — and in doing so, discovering and reporting a real bug in one of the most widely deployed password checkers in the Linux ecosystem.
Key Findings
Across the body of work, several findings stand out. Password composition policy inference from breach data is tractable using outlier detection on cumulative frequency distributions, and robust to the noise introduced by intentional dataset padding and processing errors. Stricter password policies do not reliably improve security under all assumptions about user behaviour — under convergent reselection, where users cluster onto the most popular remaining password, restrictive policies can actively reduce security. Formally verified password checkers can be built with modest performance overhead and deployed on production systems; and unverified code running on millions of servers can contain silent, impactful bugs that only rigorous comparison against a verified implementation will surface. Longer, lower-complexity passphrase policies consistently outperform traditional character-class complexity requirements across datasets and attack magnitudes.
Real-World Implications
The thesis delivers not just findings but tools: pol-infer, Skeptic, and the verified PAM module framework are all open-source and deployable. For practitioners, the work provides a defensible, data-driven alternative to intuition-based policy selection, accessible to non-experts through a purpose-built domain-specific language. For researchers, it demonstrates a complete pipeline from messy real-world data to formally verified enforcement software — a methodology applicable well beyond passwords.
So, What?
This thesis landed in June 2024 at a moment when the credential security landscape had never been more consequential or more troubled. Passwords remain the primary authentication mechanism for the overwhelming majority of digital systems, credential theft drives a substantial proportion of all breaches, and the gap between what we know about secure password policy design and what most organisations actually deploy remains wide.
What makes this body of work distinctive isn't any single result — it's the arc. Most password security research either studies the problem statistically or addresses it formally. This thesis does both, in sequence, treating the two approaches not as alternatives but as complementary stages of the same pipeline: statistical methods to understand and clean the data, generate evidence, and rank policies; formal methods to specify, verify, and deploy the enforcement software that acts on those rankings. That end-to-end coherence is rare, and it reflects a broader principle that is increasingly relevant across AI and security: that rigour at the analysis stage is only as valuable as the rigour of the systems that act on it.
The timing also matters. The verified software and policy selection methodology developed here become more, not less, relevant as authentication infrastructure scales and as AI-driven systems begin making autonomous decisions about security policy. When an autonomous agent is selecting and enforcing password policies across thousands of systems, the difference between evidence-based selection and intuition, and between verified enforcement and unverified code, is not academic — it's the difference between security guarantees you can defend and ones you merely hope hold.
Author's note: These publication summaries are AI-assisted. I use AI to present my work in a consistent, accessible way — the research and writing behind each publication is entirely my own.
