Skeptic: Automatic Password Composition Policy Selection

ACM ASIA Conference on Computer and Communications Security (ASIACCS '20), Taipei · S. Johnson, J.F. Ferreira, A. Mendes, J. Cordry · Teesside University / University of Lisbon / University of Beira Interior

📄 Get the Paper · 📊 Get the Slides · 📽️ Go to the Video

In Short

When an organisation needs to choose a password policy — minimum length, required character types, complexity rules — how do they actually make that decision? In practice, almost universally, the answer is intuition. An administrator picks something that feels secure, or copies what another organisation does, or follows a guideline that may be years out of date. For a decision that directly determines how vulnerable a system is to password attacks, that's a surprisingly weak foundation.

This paper introduces Skeptic: a software toolchain that replaces gut feeling with a rigorous, data-driven methodology for automatically ranking and selecting password composition policies. It does this by drawing on over 205 million real-world passwords from three major breach datasets, mathematically modelling how users would behave under different policies, and quantifying the likely security outcome of each — all without ever needing to store, share, or directly handle any user password data. The result is validated against two established empirical studies and shipped as open-source tooling with a custom domain-specific language that makes the output accessible to non-expert practitioners.

The Breakdown

Problem

Choosing a password composition policy is a critical security decision, yet research had consistently shown it was rarely made rigorously. Studies found little to no correlation between how restrictive a policy was and the value of the assets it protected — organisations with highly sensitive data were no more likely to have well-chosen policies than those without. Existing research methods for evaluating policies were themselves impractical for most organisations: either they required expensive, time-consuming user studies with recruited participants (good ecological validity, but completely inaccessible to a typical sysadmin), or they relied on direct analysis of leaked password datasets that raised serious privacy and legal concerns. Neither approach could evaluate novel policies for which no breach data yet existed.

Approach

Skeptic's core insight is that you can simulate the effect of a password policy on a user population without needing to collect new passwords from anyone. Starting from a cleaned probability distribution derived from a real breach dataset, the tool mathematically models what happens when a policy is imposed: some passwords become forbidden, and users who would have chosen them must pick something else. Four distinct "macrobehaviour" models capture different assumptions about how users reselect — from worst-case (everyone converges on the most common remaining password) to best-case (everyone picks a unique new one, as if using a password manager). After redistribution, the tool fits a power-law curve to the resulting distribution and extracts a single parameter — its steepness — as a proxy for security: a flatter distribution means passwords are more spread out, making guessing attacks less effective. Crucially, once the curve is fitted, the underlying password data can be discarded entirely. Policy comparisons and rankings are performed using only the fitted equations, eliminating privacy concerns. A purpose-built domain-specific language called Pacpal makes the results actionable for practitioners, and a Coq integration layer allows policies to be formally verified — including proving that specific policies render a system immune to known botnet malware attack dictionaries.

Key Findings

Tested across 28 distinct password composition policies and validated against two independent empirical studies, Skeptic's rankings correlated strongly with real-world cracking results — with Pearson coefficients reaching as high as -0.969 against offline attack scenarios. The methodology proved most accurate for large-scale offline attacks and less precise for small online attacks, which was expected given it is attack-independent by design. A particularly notable finding: under the convergent user behaviour model — where users forced off their preferred password cluster onto the next most popular one — stricter policies can actually decrease security, because they funnel a larger proportion of the user base onto a smaller number of predictable passwords. This is a counterintuitive but important result with direct implications for policy design. The full ranking of all 28 policies consistently placed longer passphrase-style policies (3class16, 2word16, basic20) at the top, outperforming traditional complexity-heavy approaches like comp8.

Real-World Implications

The immediate implication is a practical tool that any organisation can use to make a defensible, data-informed password policy selection without conducting user studies or handling sensitive data. The broader implication is a reframing of how security decisions should be made: not by convention or intuition, but by modelling expected outcomes under explicit assumptions about user behaviour. The Coq immunity proofs add a further dimension — the ability to formally certify that a given policy prevents compromise by specific known attacks, which is directly relevant to product security certification and compliance.

So, What?

This paper sits at an interesting intersection that has only grown more relevant since 2020: the intersection of automated security tooling, formal reasoning, and the question of how AI systems should make security decisions.

The credential threat landscape has continued to worsen dramatically. The underground market for stolen passwords has industrialised — in 2024 alone, 2.8 billion passwords were put up for sale on criminal forums, with average prices for stolen credentials dropping to around $10. At that scale and price point, password guessing attacks are not a sophisticated threat reserved for nation-state actors; they are a commodity. The quality of password policies deployed at scale has never mattered more, and the gap between evidence-based policy selection and intuition-based selection has never been more costly.

What makes Skeptic significant in this context is less the specific rankings it produces — password policy research has continued to evolve — and more the methodology it demonstrates. It shows that you can build a decision-support system for a critical security choice that is simultaneously data-driven, formally justified, privacy-preserving, and accessible to non-experts. That combination is rare, and the engineering choices required to achieve it — the macrobehaviour abstraction, the power-law proxy, the equation-only comparison layer, the DSL — each represent a design pattern with applicability well beyond password policies.

For AI-driven offensive and defensive security, the relevance is direct. One of the central unsolved problems in autonomous security systems is how they should make policy decisions — not just execute known procedures, but select between competing approaches under uncertainty, with justifiable reasoning. Skeptic is a small but concrete example of that problem being solved in one domain. The user behaviour modelling framework it introduces maps naturally onto the kind of probabilistic reasoning that autonomous agents need to do when anticipating how targets will respond to different attack or defence strategies.

The formal verification layer also points forward. The demonstration that you can encode a policy in Coq and prove immunity to a specific botnet's attack dictionary in a handful of lines is the kind of capability that becomes increasingly important as security automation scales. When an autonomous system is making thousands of policy decisions, you need more than empirical confidence — you need proofs. This work shows that's achievable, at least for well-defined subproblems, without prohibitive complexity.

Viewed across the arc of the research programme it belongs to — from verified password checkers through policy inference to automatic policy selection — Skeptic represents the most mature and complete contribution: a working tool, validated against real data, that closes the loop from theoretical rigour to practical deployment.

Author's note: These publication summaries are AI-assisted. I use AI to present my work in a consistent, accessible way — the research and writing behind each publication is entirely my own.