If you’re reading this, it’s very likely that you know how to use the internet. It’s also likely you’ve made an account on the internet somewhere. When you created your last account, what kind of requirements were you forced to use? For a number of web services, these requirements still follow the 2003 NIST SP 800-63 Appendix A standards that recommend an 8-character minimum, containing one uppercase, one lowercase, one digit, and one special character (Ex: Procircular1!).
We’re going to look at the math behind this format of password and determine how difficult cracking that password would be.
But first, a few assumptions:
- We only care about the English dictionary/alphabet. We’re not going to worry about French/Spanish words, but using other languages is a great way to decrease the effectiveness of some attacks.
- We assume an attacker knows that we are using the English dictionary/alphabet and the exact set that we are using. This is following Shannon’s Maxim.
- The attacker has the password hash and can try to break it locally.
- The hash that was used is perfectly secret and collision resistant, the only way to break it is to hash the exact plaintext and compare it.
One more note - If an attacker is forced to try and break the password by making a web request, then they will be bottlenecked by their network connection. If the hash is not secure, then there are plenty of other ways an attacker may be able to break our password, which does happen in the wild constantly.
Now back to the math:
When we combine all lower and uppercase characters, numbers one through ten, and 33 various special characters, we have 95 printable characters available on your typical keyboard. If we assume that we have no idea about the order of any characters in this password , then the amount of entropy (a measure of the amount of potential data in a password) in our password follows the following formula:
95⁸ = 6,634,204,312,890,625
That’s 6.63 quadrillion possibilities, enough that a single GPU would likely require days to brute force. But how many people do you know that can memorize a unique eight random character password for every account they have? Since humans only have so much RAM in their heads to keep track of all these things, we tend to follow a pattern. This pattern is almost invariably 6-10 letters (upper or lower case), 1 number, and 1 special character. At the lower bound, this is how many combinations we have to brute force.
52 x 52 x 52 x 52 x 52 x 52 x 10 x 33 = 6,524,301,189,120
A modest six and a half trillion different combinations. We’ve now effectively cut our key space to one tenth of the original size (this estimate is a bit of a clobber but fine for our purposes). This is still a massive number to brute force, but brute forcing is an inefficient method of password cracking, often only used as a last resort or if an attacker is only working to crack a single password (such as a domain administrator hash) and can point all resources at a single target.
So how can we perform this cracking more efficiently? While there’s an entire field of study devoted to developing methods to more effectively crack passwords, we’re going to focus on the most common method, dictionary attacks.
If we assumed that our first 6 characters were random letters either upper or lower case, we have a total of 19,770,609,664 (52^6) possible combinations for the first 6 characters. But as we know, humans tend to use something memorable as opposed to just using random strings of letters. So, what if we use words in our password cracking attempts instead of individual characters?
According to the Oxford English Dictionary’s Second Edition published in 1989 there are about 170,000 words that are currently used along with another 47,000 obsolete words. If we include the various forms of each of these words, (for example: ran, run, running) as well as words that are colloquially known but haven’t yet been added, this number approaches 750,000. This isn’t a very accurate number, but it will serve as a general representation of what words people might use in a password and will serve well enough for our purposes here.
By using words rather than individual letters, we have cut our possibilities to a manageable .00379% of the original possibilities for the letters portion of the password. Keep in mind this does not account for proper nouns, slang/regional dialect, or letter substitutions (Ex: pr0circular). We are still left with the portion of the password that contains the number and special character.
Let’s assume that the first portion of the password exists within our 750,000 English word list and we follow that with a second portion containing a number and a special character. The number of possibilities for a brute force attack is:
750,000 * 10 * 33 = 247,500,000 possible combinations
Even using modest modern hardware, such as the Nvidia GTX 1070 graphics card, an attacker can attempt to guess the password at a truly staggering rate. In certain situations, this rate can theoretically reach as high as billions or even tens of billions of attempts every second. There are factors that make guessing more difficult and can dramatically affect the rate an attacker can guess the password, so we are going to keep things simple and just pick a rate that any average computer should be able to achieve in most situations. That rate will be 100,000 guesses/second.
2,475,000,000 ÷ 100,000 = 2,475 seconds
2,475 seconds works out to 41 minutes to try every single possibility. This represents the maximum amount of time it will take to guess a password that follows our format. There are still a few considerations to be made.
This scenario did not include additional characters at the beginning or end or letter/number substitutions. When these types of elements are added to the basic password format that we have been using, the time it takes to guess the password theoretically increases with the Hollywood drama of an exponential curve.
However, this exponential increase to cracking difficulty does not directly reflect what is seen in the wild. Attackers still seem to be able to get around these mathematical barriers. There are two things that spring to mind that can be done to make cracking passwords worthwhile. The first is really simple: get more powerful hardware. This little experiment assumed what at the time would be considered middle market in terms of processing power. This option still takes some funding, so an attacker would want to make sure that the reward is worth the investment, which at least has the possibility of discouraging an attacker.
Unfortunately, the second possibility has no barrier at all to discourage an attacker and is much more effective. It is possible to dramatically reduce the time it takes to crack a password by taking advantage of people’s humanity. This isn’t humanity in any noble or aspiring sort of way. Simply knowing that humans are pattern seeking creatures means we can look at password data and find what tendencies are followed when adding letters, numbers or an exclamation point to the end of passwords. Rules can then be written to tell the computer that when it makes a guess of “Password19!” that the computer should go ahead and try “P@s$w0rd1!”. The second attempt appears that it should be a dramatically stronger password but since people share the tendency to substitute a letter for something that looks similar it is effectively no stronger than the first.
The purpose of this has been to gain a basic starting point to understanding the relative strength of our passwords by investigating the barebone format of “word+number+character”. So how can we combat bad passwords? Adding random numbers and special characters make things difficult to remember and on their own don’t add an incredible amount to the overall strength. But that is what has been taught for decades now. What is a possible solution to the password panic?
New guidelines are becoming more common and align to a much different view of passwords. Before, the complexity of passwords is what made passwords difficult for humans to remember while becoming easier for machines to guess. Now, let’s look at the inverse, a password that is easy for a human to remember but difficult for a computer to guess. Since password complexity adds so much confusion for people that can’t quite remember if they made the E a 3, why not just stop requiring password complexity. I can hear the low grumble of disagreement and distrust now... But there was a time that people thought that not using complex password would break the internet and force people to go to the dangerous land “Outside” or even more terrifying… the possibility of having no other choice but to “talk to someone “in person”. Nerd jokes aside, there are other ways to add entropy.
The easiest way is by adding length. This is actually a great thing. Without complexity it becomes much easier create a passphrase rather than a password. It’s much easier to remember, and you can create stories or games to help (see https://xkcd.com/936/). It could be that I am just have terrible grammar or maybe I am a terrible typist (both is treu), but I have a real problem hitting those special character keys. Now I can do what is easier for me and simply make passwords longer. If it’s easier for you to use complexity, that’s ok! You can still use it. It’s our habits of how extra characters were used that made passwords weaker than they appeared. Complexity is still a good thing to add.
If we take 4 random words from our space of 750,000 then our space now looks like:
750,000⁴ = 316,406,250,000,000,000,000,000
This is significantly larger than our original key space, and it’s now much easier to remember!
Please keep in mind that these calculations are not exact figures (3 is basically Pi, right?) and some of the assumptions that we made often do not hold in the real world.
Here’s to more password entropy!