hashcat Forum

Full Version: Using autoregressive character-level language model to augment wordlists?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
If I am not mistaken, when some new database gets breached and password hashes are obtained, people try to first crack the hashes using passwords from previous databreaches. Sometimes this works because some people reuse their password or some people just choose the same password because humans are not very good at generating random data, and exactly because of that, I think AI might be able to pick up on whatever pattern there is in human generated passwords. 

A tool like makemore seems to be able to do just that. So my question is, if wordlist "A" is used to crack hashlist "B" and it successfully cracks "p" percent of hashes, is synthetically augmented wordlist "A*" with size 1.5 or 2x of wordlist "A" going to improve the "p" percent and by how much.
This is one of the main functions of rules. Instead of just trying previously seen passwords like "octopus", we can try "octopus1", "Octopus", "OcToPus" etc. As for your question, it's impossible to predict by how much. There are many models/applications that try to do this kind of thing, like the PRINCE processor but by far the most effective is just to simply run rules. With just flat files, we can process a few million candidates per second, with rules, we can process a few tens/hundreds of billions of candidates per second due to how GPUs can generate their own work instead of constantly having to request it from the CPU. There have been many, many planned ideas for integrating AI models into Hashcat but they're often just simply not good enough and too slow or complex to be added directly into Hashcat, especially on the directly on the GPU which would be preferable.

Related links:
https://hashcat.net/wiki/doku.php?id=rule_based_attack
https://github.com/hashcat/princeprocessor
https://github.com/hashcat/hashcat/issues/3923
Makes sense and of course there is no analytical solution to my question, someone has to conduct an experiment and perhaps compare the results with wordlist A plus rule based attack. It would be quite interesting to see what's gonna happen.
And regarding the "slowness", I know this does not apply directly to the problem we are trying to solve here but when AlphaZero beat stockfish it was only searching 80k positions per second while stockfish was searching 70 million positions.
And now Torch rules the entire Chess world :) I know what you're saying though and I agree that often brains is better than brawn but with something as relatively unpredictable as passwords, it's simply just a very difficult thing to predict. You're not just guessing what someone will choose, you're searching an almost random space of billions of people's brain RNGs. There isn't a pattern between one person to another, you just have to crush your own way through it. You're absolutely welcome to experiment with the many models out there, though and I'll have to add it to my own to-do list