How to estimate the entropy of a password

algorithmspasswords

Having read various resources about password strength I'm trying to create an algorithm that will provide a rough estimation of how much entropy a password has.

I'm trying to create an algorithm that's as comprehensive as possible. At this point I only have pseudocode, but the algorithm covers the following:

  • password length
  • repeated characters
  • patterns (logical)
  • different character spaces (LC, UC, Numeric, Special, Extended)
  • dictionary attacks

It does NOT cover the following, and SHOULD cover it WELL (though not perfectly):

  • ordering (passwords can be strictly ordered by output of this algorithm)
  • patterns (spatial)

Can anyone provide some insight on what this algorithm might be weak to? Specifically, can anyone think of situations where feeding a password to the algorithm would OVERESTIMATE its strength? Underestimations are less of an issue.

The algorithm:

// the password to test
password = ?
length = length(password)

// unique character counts from password (duplicates discarded)
uqlca = number of unique lowercase alphabetic characters in password
uquca = number of uppercase alphabetic characters
uqd   = number of unique digits
uqsp  = number of unique special characters (anything with a key on the keyboard)
uqxc  = number of unique special special characters (alt codes, extended-ascii stuff)

// algorithm parameters, total sizes of alphabet spaces
Nlca = total possible number of lowercase letters (26)
Nuca = total uppercase letters (26)
Nd   = total digits (10)
Nsp  = total special characters (32 or something)
Nxc  = total extended ascii characters that dont fit into other categorys (idk, 50?)

// algorithm parameters, pw strength growth rates as percentages (per character)
flca = entropy growth factor for lowercase letters (.25 is probably a good value)
fuca = EGF for uppercase letters (.4 is probably good)
fd   = EGF for digits (.4 is probably good)
fsp  = EGF for special chars (.5 is probably good)
fxc  = EGF for extended ascii chars (.75 is probably good)

// repetition factors.  few unique letters == low factor, many unique == high
rflca = (1 - (1 - flca) ^ uqlca)
rfuca = (1 - (1 - fuca) ^ uquca)
rfd   = (1 - (1 - fd  ) ^ uqd  )
rfsp  = (1 - (1 - fsp ) ^ uqsp )
rfxc  = (1 - (1 - fxc ) ^ uqxc )

// digit strengths
strength =
( rflca * Nlca + 
  rfuca * Nuca +
  rfd   * Nd   +
  rfsp  * Nsp  +
  rfxc  * Nxc    ) ^ length

entropybits = log_base_2(strength)

A few inputs and their desired and actual entropy_bits outputs:

INPUT           DESIRED        ACTUAL
aaa             very pathetic  8.1
aaaaaaaaa       pathetic       24.7
abcdefghi       weak           31.2
H0ley$Mol3y_    strong         72.2
s^fU¬5ü;y34G<   wtf            88.9
[a^36]*         pathetic       97.2
[a^20]A[a^15]*  strong         146.8
xkcd1**         medium         79.3
xkcd2**         wtf            160.5

* these 2 passwords use shortened notation, where [a^N] expands to N a's.
** xkcd1 = "Tr0ub4dor&3", xkcd2 = "correct horse battery staple"

The algorithm does realize (correctly) that increasing the alphabet size (even by one digit) vastly strengthens long passwords, as shown by the difference in entropy_bits for the 6th and 7th passwords, which both consist of 36 a's, but the second's 21st a is capitalized. However, they do not account for the fact that having a password of 36 a's is not a good idea, it's easily broken with a weak password cracker (and anyone who watches you type it will see it) and the algorithm doesn't reflect that.

It does, however, reflect the fact that xkcd1 is a weak password compared to xkcd2, despite having greater complexity density (is this even a thing?).

How can I improve this algorithm?

Addendum 1

Dictionary attacks and pattern based attacks seem to be the big thing, so I'll take a stab at addressing those.

I could perform a comprehensive search through the password for words from a word list and replace words with tokens unique to the words they represent. Word-tokens would then be treated as characters and have their own weight system, and would add their own weights to the password. I'd need a few new algorithm parameters (I'll call them lw, Nw ~= 2^11, fw ~= .5, and rfw) and I'd factor the weight into the password as I would any of the other weights.

This word search could be specially modified to match both lowercase and uppercase letters as well as common character substitutions, like that of E with 3. If I didn't add extra weight to such matched words, the algorithm would underestimate their strength by a bit or two per word, which is OK. Otherwise, a general rule would be, for each non-perfect character match, give the word a bonus bit.

I could then perform simple pattern checks, such as searches for runs of repeated characters and derivative tests (take the difference between each character), which would identify patterns such as 'aaaaa' and '12345', and replace each detected pattern with a pattern token, unique to the pattern and length. The algorithmic parameters (specifically, entropy per pattern) could be generated on the fly based on the pattern.

At this point, I'd take the length of the password. Each word token and pattern token would count as one character; each token would replace the characters they symbolically represented.

I made up some sort of pattern notation, but it includes the pattern length l, the pattern order o, and the base element b. This information could be used to compute some arbitrary weight for each pattern. I'd do something better in actual code.

Modified Example:

Password:          1234kitty$$$$$herpderp
Tokenized:         1 2 3 4 k i t t y $ $ $ $ $ h e r p d e r p
Words Filtered:    1 2 3 4 @W5783 $ $ $ $ $ @W9001 @W9002
Patterns Filtered: @P[l=4,o=1,b='1'] @W5783 @P[l=5,o=0,b='$'] @W9001 @W9002

Breakdown:         3 small, unique words and 2 patterns
Entropy:           about 45 bits, as per modified algorithm

Password:          correcthorsebatterystaple
Tokenized:         c o r r e c t h o r s e b a t t e r y s t a p l e
Words Filtered:    @W6783 @W7923 @W1535 @W2285

Breakdown:         4 small, unique words and no patterns
Entropy:           43 bits, as per modified algorithm

The exact semantics of how entropy is calculated from patterns is up for discussion. I was thinking something like:

entropy(b) * l * (o + 1) // o will be either zero or one

The modified algorithm would find flaws with and reduce the strength of each password in the original table, with the exception of s^fU¬5ü;y34G<, which contains no words or patterns.

Best Answer

Appendix A on p46 of NIST SP 800-63 talks about the work of Claude Shannon, who estimates password entropy using a number of bits. Indeed, this is the document that the XKCD cartoon uses to calculate the entropy bits. Specifically:

  • the entropy of the first character is taken to be 4 bits;
  • the entropy of the next 7 characters are 2 bits per character; this is roughly consistent with Shannon’s estimate that “when statistical effects extending over not more than 8 letters are considered the entropy is roughly 2.3 bits per character;”
  • for the 9th through the 20th character the entropy is taken to be 1.5 bits per character;
  • for characters 21 and above the entropy is taken to be 1 bit per character;
  • A “bonus” of 6 bits of entropy is assigned for a composition rule that requires both upper case and non-alphabetic characters. This forces the use of these characters, but in many cases thee characters will occur only at the beginning or the end of the password, and it reduces the total search space somewhat, so the benefit is probably modest and nearly independent of the length of the password;
  • A bonus of up to 6 bits of entropy is added for an extensive dictionary check. If the attacker knows the dictionary, he can avoid testing those passwords, and will in any event, be able to guess much of the dictionary, which will, however, be the most likely selected passwords in the absence of a dictionary rule. The assumption is that most of the guessing entropy benefits for a dictionary test accrue to relatively short passwords, because any long password that can be remembered must necessarily be a “pass-phrase” composed of dictionary words, so the bonus declines to zero at 20 characters.

The idea is that an authentication system would pick certain entropy levels as thresholds. For example, 10 bits may be weak, 20 medium and 30 strong (numbers picked arbitrarily as an example, not a recommendation). Unfortunately, the document does not recommend such thresholds, probably because the computational power available to brute force or guess passwords increases over time:

As an alternative to imposing some arbitrary specific set of rules, an authentication system might grade user passwords, using the rules stated above, and accept any that meet some minimum entropy standard. For example, suppose passwords with at least 24-bits of entropy were required. We can calculate the entropy estimate of “IamtheCapitanofthePina4” by observing that the string has 23 characters and would satisfy a composition rule requiring upper case and non-alphabetic characters.

This may or may not be what you are looking for but is not a bad reference point, if nothing else.

[Edit: Added the following.]

The paper Testing Metrics for Password Creation Policies by Attacking Large Sets of Revealed Passwords (by Matt Weir, Sudhir Aggarwal, Michael Collins and Henry Stern) demonstrated the Shannon model, described above, is not an accurate model of entropy for human generated passwords. I would recommend looking at "Section 5 Generating New Password Creation Policies" for more accurate proposals.

Related Topic