Share

Certified Password Quality

Integrated Formal Methods (iFM'17), Turin · J.F. Ferreira, S.A. Johnson, A. Mendes, P.J. Brooke · Teesside University

📄 Get the Paper · 📊 Get the Slides

In Short

Passwords are the most widely used form of digital authentication, and the software that enforces password quality rules is correspondingly critical infrastructure. Yet for decades, these tools have been written in C and shipped without any formal proof of correctness — we simply trust that they do what they claim. This paper asks: what if we could mathematically prove that a password policy checker behaves exactly as specified, with no edge cases, no undocumented behaviour, and no silent bugs?

To answer that, the team used a formal proof assistant called Coq to write and verify password quality checkers from the ground up, then automatically compile them into working Linux authentication modules — the same kind used by millions of servers worldwide. In doing so, they didn't just demonstrate a methodology; they discovered a real, previously unknown bug in one of the most widely deployed password checkers in the Linux ecosystem, reported it to the maintainers, and got it fixed.


The Breakdown

Problem

Linux systems enforce password quality through a module system called PAM (Pluggable Authentication Modules). Two of the most widely deployed modules — pam_cracklib and pam_pwquality, used as defaults in distributions like Red Hat Enterprise Linux and CentOS — are written in unverified C code. They implement nuanced policies involving password length, character class requirements, similarity to old passwords, and more. None of this logic had ever been formally proven correct, meaning any of these checks could contain bugs that silently accept passwords they should reject, or reject ones they should accept — without anyone ever knowing.

Approach

The team implemented an equivalent set of password checkers in Coq — a proof assistant that lets you write code and mathematical proofs side by side, ensuring one matches the other. Each checker was specified either as an executable function, a formal theorem, or a set of proven properties, depending on what made the logic clearest. Once verified, Coq's code extraction mechanism automatically translated the checked logic into Haskell, which was then linked via a C driver into a fully functional Linux PAM module. The result: a deployable password checker whose core logic carries a machine-verified proof of correctness.

Key Findings

When the verified module was run against 100,000 real-world leaked passwords and compared against the original modules, it behaved identically under the default policy — validating the approach. But when testing a stricter policy (limiting consecutive characters of the same class), the original pam_cracklib silently ignored the setting entirely due to an off-by-one error in a single conditional check. The verified module correctly enforced it. The bug was reported, reviewed, and patched in the official Linux PAM repository. A secondary finding was that the verified approach is also more flexible: switching to a completely different password policy (such as a simple 16-character minimum, which current research suggests is highly effective) required changing a single line of Coq, whereas the original modules couldn't even support this without recompiling from modified source.

Real-World Implications

The immediate implication is that critical security-enforcement code running on production infrastructure had a verifiable defect — and conventional testing didn't catch it. The broader implication is a proof of concept: formally verified security components can be built, extracted into deployable artefacts, and validated against real-world data. The performance overhead was modest (around 28% slower than the unverified C module), and while the compiled binary was significantly larger, neither represents a barrier to adoption on modern systems.

So, What?

This paper landed at a curious moment. In 2017, the security industry was beginning to seriously question the entire framework of complexity-based password policies — research was emerging showing that length matters far more than mandatory special characters, and that forcing frequent resets produces weaker passwords, not stronger ones. NIST's updated guidelines, finalised in 2024 and made federal standard in August 2025, now explicitly tell organisations to stop enforcing mandatory periodic password changes and to prioritise length over complexity — a direct vindication of the flexible, research-driven policy approach this paper advocated for.

But the deeper relevance isn't about password rules at all. It's about a question that is now central to the entire cybersecurity industry: can we trust the code that enforces our security policies?

The answer this paper demonstrates is that we often can't — and that we have the tools to do better. Formal verification has moved steadily from academic curiosity toward practical deployment. Verified cryptographic libraries now underpin real products. Amazon's s2n TLS implementation uses formal verification. The seL4 microkernel — used in safety-critical systems — is fully formally verified. The research thread started in this paper has continued: co-author João Ferreira's group published follow-on work through 2022 and as recently as 2026, extending the framework to verified password generation and examining user trust in formally verified security tools.

For anyone building security automation — and particularly for organisations operating in regulated environments like finance, healthcare, or critical national infrastructure — the argument this paper makes is increasingly hard to ignore: authentication logic is too important to leave unverified. In 2024, 88% of basic web application attacks involved stolen credentials, and the enforcement layer is precisely where correctness guarantees matter most. A bug in a password quality checker isn't a minor inconvenience — it's a silent gap in your security perimeter, running undetected on millions of systems.

The takeaway for AI-driven security specifically is pointed: as autonomous systems take on more of the enforcement and policy layer in cybersecurity, the question of how we verify that these systems do exactly what we intend becomes existential. This paper is an early, practical answer to that question — and the methodology it demonstrated is one the field is still catching up to.

Author's note: These publication summaries are AI-assisted. I use AI to present my work in a consistent, accessible way — the research and writing behind each publication is entirely my own.