Ha. This is an interesting discussion.
We actually already had a couple of similar topics and discussions on this forum.
It's interesting that some people do not realize how many possibilities of combining x words exist.
The short answer is: given even a small amount of words x and a list of passphrase that all need to contain n words, the combination of each and every word (even with the "small" restriction that words can't be present within the passphrase more than once) is huge and grows almost exponentially with both x and n.
Actually the simplified mathematical formula should look something like this:
1. given x different words
2. given that the passphrase must consist of *exactly* n words
x-0 * x-1 * x-2 * x-3 ... x-k (where the value k is n-1 and therefore the counter "i" goes from 0 to k)
an example: x = 11, n = 10
therefore we compute k = 10 - 1 = 9 (therefore the last number in the formula is x-k = 11-9=2)
11 * 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2
(as you can see we have n = 10 numbers within the multiplication... some might also notice that this looks very close to the factorial and it does! in general, you just need to substract the remaining combinations if x > n)
You can also think like this: For the first word you can choose all different words x, for the second one (i think you called it "column") you can choose any word except the one that was choosen previously, for the third column you can only pick the remaining ones (all minus the 2 previous ones) etc...
As already mentioned, the number of combinations growths very fast when x and/or n increase.
This of course is a simplified formula and you added this additional requirement:
- generate also the capitalized variant of each word (but do not allow the same capitalized and non-capitalized word within the same passphrase)
This almost doubles the value x (the number of words), but not always/exactly because there could be some words that can't be capitalized (e.g. if they start with a symbol or number etc) and we need to substract (or let's say "filter out") also the number of words that are not allowed because the same capitalized and non-capitalized word can't occur within the same passphrase (as you can see this makes the formula a little bit more complicated... still the limitation/filtering does only eliminate a small subset of the candidates, it's almost negligible).
The idea of combining x words was already explained, for instance here: https://hashcat.net/forum/thread-7131-po...l#pid38278
As you can see, even with "only" n = 4 words (but admittedly a high value of x = 1500) the list of password candidates is very huge (in that case it was almost exactly: 1500 * 1499 * 1498 * 1497 = 5042274741000 combinations)
Now let's speak about how you would write the code to generate such password candidates:
it would be very similar to the example mentioned here: https://hashcat.net/forum/thread-7131-po...l#pid38278 ... BUT with the exception that you have freaking 10 (or even 16) nested loops and the additional code needed to also try the capitalized and non-capitalized version (but only if they are not the same and the additional restriction that it is not allowed to have the same word within a passphrase both in the capitalized and non-capitalized variant ).
As you can see, this would make the code a little bit complicated! Already the 10 nested loops would be quite horrible to code and see!
You could however come up with a recursive version of it, something like this (perl code):
Note: you could just change the WORDS_IN_PASSPHRASE and LOWER_CASE_WORDS variables to fit your needs (it's quite flexible for any x amount of words in LOWER_CASE_WORDS and value n).
Of course you could argue that running this recursive code might not be the best idea, since it could be slow etc... but actually writing the same thing in a non-recursive way (similar to the code provided here: https://hashcat.net/forum/thread-7131-po...l#pid38278 , but adapted to 10+ words and the capitalized variant) or even writing it with ANSI C etc... might only speed it up by a negligible constant factor.
The real "cost" is to run such a huge amount of combinations. As said, in worst case... where n=x, almost the factorial of the number of capitalized plus non-capitalized words).
Therefore, even if you have a perfect password candidate generator, the total number of combinations might be too huge (depending on x and n).
The short version is: it's not complicated to come up with code for this, but the complexity (the "keyspace", number of password candidates) might be too huge even if you just want to "combine each word from a known set of words".
We actually already had a couple of similar topics and discussions on this forum.
It's interesting that some people do not realize how many possibilities of combining x words exist.
The short answer is: given even a small amount of words x and a list of passphrase that all need to contain n words, the combination of each and every word (even with the "small" restriction that words can't be present within the passphrase more than once) is huge and grows almost exponentially with both x and n.
Actually the simplified mathematical formula should look something like this:
1. given x different words
2. given that the passphrase must consist of *exactly* n words
x-0 * x-1 * x-2 * x-3 ... x-k (where the value k is n-1 and therefore the counter "i" goes from 0 to k)
an example: x = 11, n = 10
therefore we compute k = 10 - 1 = 9 (therefore the last number in the formula is x-k = 11-9=2)
11 * 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2
(as you can see we have n = 10 numbers within the multiplication... some might also notice that this looks very close to the factorial and it does! in general, you just need to substract the remaining combinations if x > n)
You can also think like this: For the first word you can choose all different words x, for the second one (i think you called it "column") you can choose any word except the one that was choosen previously, for the third column you can only pick the remaining ones (all minus the 2 previous ones) etc...
As already mentioned, the number of combinations growths very fast when x and/or n increase.
This of course is a simplified formula and you added this additional requirement:
- generate also the capitalized variant of each word (but do not allow the same capitalized and non-capitalized word within the same passphrase)
This almost doubles the value x (the number of words), but not always/exactly because there could be some words that can't be capitalized (e.g. if they start with a symbol or number etc) and we need to substract (or let's say "filter out") also the number of words that are not allowed because the same capitalized and non-capitalized word can't occur within the same passphrase (as you can see this makes the formula a little bit more complicated... still the limitation/filtering does only eliminate a small subset of the candidates, it's almost negligible).
The idea of combining x words was already explained, for instance here: https://hashcat.net/forum/thread-7131-po...l#pid38278
As you can see, even with "only" n = 4 words (but admittedly a high value of x = 1500) the list of password candidates is very huge (in that case it was almost exactly: 1500 * 1499 * 1498 * 1497 = 5042274741000 combinations)
Now let's speak about how you would write the code to generate such password candidates:
it would be very similar to the example mentioned here: https://hashcat.net/forum/thread-7131-po...l#pid38278 ... BUT with the exception that you have freaking 10 (or even 16) nested loops and the additional code needed to also try the capitalized and non-capitalized version (but only if they are not the same and the additional restriction that it is not allowed to have the same word within a passphrase both in the capitalized and non-capitalized variant ).
As you can see, this would make the code a little bit complicated! Already the 10 nested loops would be quite horrible to code and see!
You could however come up with a recursive version of it, something like this (perl code):
Code:
#!/usr/bin/env perl
# Author: philsmd
# Date: January 2018
# License: public domain (CC0)
use strict;
use warnings;
#
# CONSTANTS
#
my $WORDS_IN_PASSPHRASE = 10;
#
# Example
#
my @LOWER_CASE_WORDS = (
"this",
"is",
"the",
"list",
"of",
"words",
"that",
"you",
"want",
"to",
"try",
);
my $AMOUNT_OF_WORDS = scalar (@LOWER_CASE_WORDS);
#
# Helper functions
#
# normally we would just print it, but we need to also print the variants with capitalized (upper cased first letter):
sub print_passphrase
{
my $recursive_step = shift;
my $words_in_pass = shift;
my $passwords = shift;
if ($recursive_step < 1)
{
for (my $i = 0; $i < $WORDS_IN_PASSPHRASE; $i++)
{
print @$passwords[$i];
}
print "\n";
}
else
{
my $loop_index = $WORDS_IN_PASSPHRASE - $recursive_step;
my $word_index = @$words_in_pass[$loop_index];
my $pass = $LOWER_CASE_WORDS[$word_index];
# always generate the lower-case variant:
my @tmp_passwords = @$passwords;
push (@tmp_passwords, $pass);
print_passphrase ($recursive_step - 1, $words_in_pass, \@tmp_passwords);
# upper-case (capitalized) variant (if needed):
my $pass_capitalized = ucfirst ($pass);
if ($pass_capitalized ne $pass)
{
my @tmp_passwords_capitalized = @$passwords;
push (@tmp_passwords_capitalized, $pass_capitalized);
print_passphrase ($recursive_step - 1, $words_in_pass, \@tmp_passwords_capitalized);
}
}
}
sub generate_word_list
{
my $recursive_step = shift;
my $words_in_pass = shift;
if ($recursive_step < 1)
{
my @passwords = ();
print_passphrase ($WORDS_IN_PASSPHRASE, $words_in_pass, \@passwords);
}
else
{
# recursive step:
for (my $i = 0; $i < $AMOUNT_OF_WORDS; $i++)
{
# search in array (or use grep):
my $found = 0;
foreach my $item (@$words_in_pass)
{
if ($item == $i)
{
$found = 1;
last;
}
}
next if ($found == 1);
# if the word is NOT within the passphrase already:
# make a copy of the word list:
my @tmp_words_in_pass = @$words_in_pass;
# add the new one:
push (@tmp_words_in_pass, $i);
generate_word_list ($recursive_step - 1, \@tmp_words_in_pass);
}
}
}
#
# Start
#
if ($AMOUNT_OF_WORDS < $WORDS_IN_PASSPHRASE)
{
print "ERROR: not enough words (the amount is $AMOUNT_OF_WORDS) for the passphrase (of $WORDS_IN_PASSPHRASE words)\n";
exit (1);
}
my @words_in_pass = ();
generate_word_list ($WORDS_IN_PASSPHRASE, \@words_in_pass);
exit (0);
Note: you could just change the WORDS_IN_PASSPHRASE and LOWER_CASE_WORDS variables to fit your needs (it's quite flexible for any x amount of words in LOWER_CASE_WORDS and value n).
Of course you could argue that running this recursive code might not be the best idea, since it could be slow etc... but actually writing the same thing in a non-recursive way (similar to the code provided here: https://hashcat.net/forum/thread-7131-po...l#pid38278 , but adapted to 10+ words and the capitalized variant) or even writing it with ANSI C etc... might only speed it up by a negligible constant factor.
The real "cost" is to run such a huge amount of combinations. As said, in worst case... where n=x, almost the factorial of the number of capitalized plus non-capitalized words).
Therefore, even if you have a perfect password candidate generator, the total number of combinations might be too huge (depending on x and n).
The short version is: it's not complicated to come up with code for this, but the complexity (the "keyspace", number of password candidates) might be too huge even if you just want to "combine each word from a known set of words".