The National Institute of Standards and Technology (NIST) recently released a new recommendation on authentication, including best practices for constructing passwords.
DISCLAIMER: I am not a password security expert. But I can do some math.
You are already familiar with the previous/old NIST recommendations because these are the recommendations that drive you crazy:
- Use upper case and lower case
- Use numbers
- Use special characters (!@#$% etc)
One way or another those recommendations have worked their way into almost every system in use today, with the corresponding rules that you curse at when you are setting up a new account.
The new rules say that it’s better to just use some number of words in a phrase. No digits or special characters needed.
Let’s look at the history of password technology and do some math. Don’t be scared – we won’t be doing anything more difficult than raising a number to a power — which, in a throwback to the old days of Fortran, I will represent in this note using ** as in: 2**3 is 8:
2 ** 3 = 2 * 2 * 2 = 8
If I happen to know that your password is only two characters long, perhaps because I heard how many keyclicks there were when you typed it in, and I can guess that (like most people) you picked your password only from lowercase letters from a to z, then how many passwords would I have to try to guess yours? The answer is that there are 26 letters to choose from, therefore:
N = 26 ** 2 = 676
There are only 676 two-character lowercase passwords I have to try if I want to search all the possibilities to break your password. I can break your password by simply trying every combination “aa”, “ab”, “ac” … “zx”, “zy”, “zz” until I find the one that works.
In the old days passwords were usually limited to 8 characters. This limit can be traced all the way back to late 1970s Unix implementations of the DES password encryption algorithms. In the early days of the web most web site servers were running on Unix boxes that still used the same password code from the 1970s and often still had the eight character limit.
Obviously, 676 passwords won’t take very long for someone to try (by computer), which is why password software usually required you to use more characters – often times making you use an eight character password. A dirty little secret of some of those older systems is that they’d let you set a longer password, but in fact only ever computed based on the first eight. The old NIST recommendations were written during a time when that was still a consideration.
If I still know that you only used lowercase letters and there is a maximum of 8 characters, there are:
N = 26 ** 8 = approximately 208 billion
When crackers “steal password files” from hacked web sites, what they get is not the passwords themselves, but rather their encrypted forms. This looks like a bunch of gibberish characters. When a web site checks your password, it asks you for your password, encrypts it, and sees if it gets the same gibberish it got back when you first set your password.
Web sites generally never store your original password and there is no way to recover the original password from this encrypted gibberish. Thus, when the bad guys steal a “password” file what they really have to do is just guess every possible password, putting each guess through the encryption software, until they find one that matches the gibberish string they have gotten their hands on.
So we can see the advantage of an 8 character password, instead of a 2 character password, is that they will have to try roughly 208 billion guesses to find your password. Technically, on average, they will have to try half of that before they get lucky and find yours, but for the rest of this memo I will ignore that factor of 2 because it’s not really significant and just clutters the discussion.
When computers were slower, running the DES algorithm 208 billion times would take a long enough that it wasn’t much of a threat. The calculations could take weeks, but as computers got faster and faster that number gradually came down and with modern machines this is now a practical method of attack.
This is why the old password recommendations suggested that you use more characters than just lowercase a to z. If, for example, you randomly picked from uppercase and lowercase characters, there would be 52 possibilities for each position in your password, and the number of guesses required to crack your password went up dramatically:
N = 52 ** 8 = 53.4 TRILLION
Simply by adding upper case into the equation the number of possible passwords increases by a factor of 256 (those of you who are insightful with math will note that we doubled the choices – from 26 to 52, and since there are 8 password characters the possibilities increased by a factor of 2 ** 8 = 256)
If digits (another 10 possible characters) and special characters (!@#$% etc) are added, the possible choices go up to 80 or more. Let’s take 80 possible characters and see what we get:
N = 80 ** 8 = 1677 TRILLION
That looks like a lot of possibilities. And it could be even higher because there are actually more than 80 choices of possible characters people could use in their passwords. But there are some problems. In reality humans get annoyed by all those rules and usually pick passwords that aren’t really randomly selected from all possible characters and they do other things that reduce the possible number of passwords that have to be guessed.
Let’s go back to the upper and lower case combinations (and ignore digits and special characters for now). I said there were
N = 52 ** 8 = 53.4 TRILLION
possible combinations for choosing 52 characters (upper and lower case a to z) eight times. But when most people see this message:
Password must contain at least one upper case character
what do they do in reality?
They take their lame password, and capitalize one letter of it to get past this rule.
How many combinations of passwords are there, if as a bad guy I am reasonably assured that your password only has one uppercase character? Now instead of 52 possibilities for each character, there are still only 26 possibilities, and then there are 8 choices for which one of the positions is going to be upper case. Therefore, instead of:
N = 52 ** 8 = 53.4 TRILLION
possibilities, there are really only:
N = 26 ** 8 * 8 = 1.6 TRILLION
A similar problem occurs with the digits and special character rules. Many people just substitute numbers for letters in a fairly predictable way, e.g., using the digit zero for the letter “o”, and the digit 3 for the letter “e”, and similar things like that. We all do this, thus many passwords in the real world look like these:
The bad guys know that people do this, and when they write their guessing software they don’t have to go through all of the character possibilities. The real number of strings they have to guess is much, much, lower than the simple exponentiation math would imply. This knowledge dramatically decreases the number of possibilities that have to be computed to try to crack your password, and the sophisticated cracking software incorporates knowledge such as “try ordinary words but substitute the number 3 for e” and similar tendencies.
Over time the eight character limit went away, so longer passwords became possible, and many web sites will allow you to have fairly long passwords but still encouraged you to use all sorts of random characters in an attempt to make that exponentiation math work out to a large number.
But people still pick bad passwords because a truly random password like “x@8Q-99!va@:d” is just impossible to remember; no one picks passwords like that.
The new recommendation from NIST takes that into account, and instead recommends that you just pick a phrase that you can remember and no one else would know. This assumes that modern password systems can accept much longer passwords – which most can (it is likely that there is no practical limit in most software these days, though sometimes the web designers impose limits on the login screens).
So let’s look at some math. Suppose you picked a four word phrase from the vocabulary of an 8 year old child. How many passwords are possible?
According to various studies, the average 8 year old native speaker has a vocabulary of about 10000 words. This means that there are:
N = 10000 ** 4 = 10,000 TRILLION
This number is already 6 times higher than the 80 character, fully-random, 8 character calculation, and keep in mind that we already debunked that math as overly generous because no real human being ever actually picks those gibberish characters randomly. This implies that the advantage of the four word random phrase is far greater than “just” a factor of six we just calculated here.
Most adults will have even larger vocabularies, in the neighborhood of 20,000 to 35,000 words, so the number of four-word phrases you might pick for your password becomes even larger.
Now, of course, people are still people, and they might still pick bad passwords even if they are made out of multiple words:
this is my password
I hate password rules
you can't guess this
and so forth. But if you pick a password that:
- is selected from a wide range of words
- uses at least one “unusual” word
- isn’t obviously based on something people might know about you
- but is still easy for you to remember
then simply combining four words into a phrase and using that as your password is likely to be more secure than eight characters of gibberish. So, as systems around the web start getting updated to conform to the new password recommendations, hopefully you’ll be able to use passwords like these:
lemon blue flying campfire
tree eating pickle moon
disintegrating alien cheese sundae
It would be best if you tried to include some unusual words; remember, you are trying to make the bad guys have to guess from as many words as possible. Though, even if you stick to “just words an eight year old would know” there are roughly 10,000 choices and that already makes your password harder to guess than a realistic eight character “old style” password. Personally I can type pretty well, so “disintegrating alien cheese sundae” is something I could potentially envision using as a password (ooops, ok, not now that I’ve published this haha).
The beauty of the new NIST recommendations is that most people should be able to come up with memorable passwords that are difficult to guess and draw from between 10,000 and 20,000 words for each word in the phrase. The math is inexorable: there are more combinations for these passwords than there are for shorter gibberish passwords.
Of course, if you pick an obvious phrase that a bad guy can guess, that’s your fault. Don’t set your new password to “I love my cat” if everyone knows you love your cat.
If you are paying attention, you will note that the new NIST recommendations are somewhat equivalent to saying “hey, just use a longer password”. So my example of “disintegrating alien cheese sundae” is actually a password of length 33 (including the spaces). Thus in some sense the NIST recommendation isn’t really anything new or earth-shattering. We already know that every time you add one character to a password, it gets harder to guess by a factor related to how many possible characters there are. In fact, a 33 character random password made out of only lowercase letters would have:
N = 26 ** 33 = an enormously large number (10 to the 46th)
possibilities. But, of course, no one is going to have a 33 character random password because it would be impossible to remember. So the NIST recommendation is actually a sneaky way to get us to have longer passwords, at the cost of choosing from a less-than-random set of characters (i.e., those that combine into actual words). There’s no magic here, it’s simply the observation that the longer the password is the better it is, and if we have to give up some randomness (fewer character choices than totally random) to get to this longer password length, the math still works out favorably.
I’m looking forward to getting rid of my ridiculous eight character gibberish passwords and replacing them with easier to remember phrases, though I imagine it may take many years for the tedious old NIST suggestions to become thoroughly debunked and for the newer methodology to find its way into account password rules.