Wordlist optimisation based on ruleset
There has been some work in this space, but it's a challenging problem. And it's highly idiosyncratic in that the usefulness of any given wordlist and ruleset depends heavily on the nature of the passwords being attacked.

Generally, getting base words from passwords is sometimes called 'stemming' (a term borrowed from linguists). Some basic stemming can be done with rurapenthe's rurasort (https://github.com/bitcrackcyber/rurasort). IIRC Matt Weir (lakiw) has done work in this space as well.

Also, testing whether a given ruleset works well for a given wordlist and a given hashlist is an art in itself, and again depends heavily on the source and target material. hashcat's --debug-mode / --debug-file and --outfile* parameters are useful for this.

In general, you've got a much larger accumulation of base passwords than is likely to be efficient for most attack types, so your instincts are good. But before trying to get the base words from 400GB of raw wordlists, I'd start with a smaller corpus (like the hashes.org founds) and build up from there. For general password targets, that is more likely to be more time-efficient.

Messages In This Thread
RE: Wordlist optimisation based on ruleset - by royce - 10-29-2019, 06:24 AM