Counting Possible Passwords
Published on Saturday, February 25th, 2017
Jack showed “the rules for password naming” in a website to me. Here’s an excerpt:
- The password must be exactly 8 characters long.
- It must contain at least one letter, one number, and one of the following special characters.
- The only special characters allowed are: @ # $
- A special chaacter must not be located in the first or last position.
- Two of the same characters sitting next to each other are considered to be a “set.” No “sets” are allowed. Example: rr, tt
- Avoid using names, such as your name, user ID, or the name of your company or employer.
- Other words that cannot be used are Texas, child, and the months of the year.
A new password cannot be too similar to the previous password.
- Example: previous password - abc#1234; unacceptable new password - acb$1243
- Characters in the first, second, and third positions cannot be identical. (abc*****)
- Characters in the second, third, and fourth positions cannot be identical. (*bc#****)
- Characters in the sixth, seventh, and eighth positions cannot be identical. (*****234)
- A password can be changed voluntarily (no Help Desk assistance needed) once in a 15-day period. If needed, the Help Desk can reset the password at any time.
- The previous 8 passwords cannot be reused.
One way to create a password is creative spelling and substitution. Examples:
- phuny#2s
- fish#1ng
- t0pph@tsAhem... To quote from somewhere... No “sets” are allowed.
- run$4you
- ba#3ries
I’m terrified.
This reminds me of an xkcd comic “Password Strength”. As the comic states, through 20 years of effort, we’ve successfully trained everyone to use passwords that are hard for humans to remember, but easy for computers to guess. For this website, it’s even worse because the website itself forces users to abide these nonsensical rules.
What we are more interested, however, is to count how many possible passwords there are! We will consider only rule 1 to 5, as the rest is about specific names or previous passwords which would be hard to define mathematically.
It might be possible to use combinatorics to solve the problem. However, as I study computer science, I will use computer to solve the problem!
First, write a naive program to count the answer:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | import string as st def count_naive(num_symbols, num_digits, num_letters, length): symbol_chars = "@#$!%^&*()_+=-`~{}[]:?/>.<,;"[:num_symbols] digit_chars = st.digits[:num_digits] letter_chars = st.ascii_letters[:num_letters] all_chars = symbol_chars + digit_chars + letter_chars string = [-1] def generate(n): if n == 0: passed = (any(c in string for c in symbol_chars) and any(c in string for c in digit_chars) and any(c in string for c in letter_chars)) if passed: # uncomment next line to print all passwords # print(''.join(string[1:])) pass return passed total = 0 for c in all_chars: if (n == 1 or n == length) and c in symbol_chars: continue if string[-1] == c: continue string.append(c) total += generate(n - 1) string.pop() return total return generate(length) |
And then we can run:
1 2 3 | $ python3 -i count.py >>> count_naive(2, 3, 4, 5) # 2 symbols, 3 numbers, 4 letters, length 5 11472 |
Pretty much, this program goes over the entire possible stringsAll continue
statements would do some pruning, but that won’t help much and chooses only valid ones. Now, we can compute the answer to the original question. It’s just count_naive(3, 10, 52, 8)
, as the number of symbols is 3 (#, $, and @); the number of digits, 10 (0-9); number of letters, 52 (a-z, A-Z), right?
The answer is yes, but as the program goes over the entire possible strings, it would take a really long time to compute an answer. In particular, the algorithm’s runtime is
To reduce the running time, notice that all numbers/letters/symbols look the same. Instead of iterating over
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | def count(num_symbols, num_digits, num_letters, length): SYMBOL = 0 DIGIT = 1 LETTER = 2 groups = [0, 1, 2] sizes = [num_symbols, num_digits, num_letters] string = [-1] total = 0 cnt = 1 def generate(n): nonlocal cnt, total if n == 0: if {0, 1, 2} <= set(string): total += cnt return for g in groups: if (n == 1 or n == length) and g == SYMBOL: continue mult = sizes[g] - (1 if string[-1] == g else 0) if mult == 0: continue string.append(g) cnt *= mult generate(n - 1) cnt //= mult string.pop() generate(length) return total |
The above algorithm has runtime nonlocal
.
Computing the answer is now feasible (only one second!).
1 2 3 4 5 | $ python3 -i count.py >>> count(2, 3, 4, 5) 11472 # great! this matches our previous implementation >>> count(3, 10, 52, 8) 46082343914880 |
Assuming that this answer is correct, we can see that all these weird rules reduce
How can we be sure that this new program is indeed correct? We can create an oracle to test it on smaller inputs, using the slower but definitely correct algorithm as the tester.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | def oracle(f1, f2): lim = 5 n = 100 for trial in range(n): print('Trial #{}'.format(trial)) num_symbols = random.randint(1, lim) num_digits = random.randint(1, lim) num_letters = random.randint(1, lim) length = random.randint(1, lim) args = [num_symbols, num_digits, num_letters, length] answer1 = f1(*args) answer2 = f2(*args) if answer1 != answer2: print(("symbols: {}, digits: {}, letters: {}, len: {}\n" + "f1: {}, f2: {}").format(*(args + [answer1, answer2]))) return False return True print(oracle(count, count_naive)) #=> True |
Now we have an
Let
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | def count_dp(num_symbols, num_digits, num_letters, length): SYMBOL = 0 DIGIT = 1 LETTER = 2 sizes = [num_symbols, num_digits, num_letters] groups = [0, 1, 2] binary = [0, 1] dp_template = {(g, hs, hd, hl): 0 for g in groups for hs in binary for hd in binary for hl in binary} dp_new = dict(dp_template) dp_new[SYMBOL, 1, 0, 0] = 0 # the first character can't be a symbol dp_new[DIGIT, 0, 1, 0] = sizes[DIGIT] dp_new[LETTER, 0, 0, 1] = sizes[LETTER] for i in range(1, length): dp_old = dp_new dp_new = dict(dp_template) for g in groups: space = [(hs, hd, hl) for hs in binary for hd in binary for hl in binary] for hs, hd, hl in [t for t in space if t[g] != 0]: for prevg in groups: dp_new[g,hs,hd,hl] += ( dp_old[prevg, hs, hd, hl] + (dp_old[prevg, 0, hd, hl] if g == SYMBOL and hs else 0) + (dp_old[prevg, hs, 0, hl] if g == DIGIT and hd else 0) + (dp_old[prevg, hs, hd, 0] if g == LETTER and hl else 0) ) * (sizes[g] - (1 if g == prevg else 0)) return dp_new[DIGIT, 1, 1, 1] + dp_new[LETTER, 1, 1, 1] print(oracle(count_dp, count)) #=> True |
This yields a linear time algorithm. Additionally, the algorithm takes constant amount of memory!
FIN