Using autoregressive character-level language model to augment wordlists?
#5
(12-27-2023, 06:27 AM)penguinkeeper Wrote: And now Torch rules the entire Chess world Smile I know what you're saying though and I agree that often brains is better than brawn but with something as relatively unpredictable as passwords, it's simply just a very difficult thing to predict. You're not just guessing what someone will choose, you're searching an almost random space of billions of people's brain RNGs. There isn't a pattern between one person to another, you just have to crush your own way through it. You're absolutely welcome to experiment with the many models out there, though and I'll have to add it to my own to-do list

So it's been a while but recently this idea popped back into my mind.

I took a wordlist hashmob.net.small - 2 million passwords, and I trained a small 25 million parameter model for not even half an epoch on it.

Then I generated 10,000 new passwords. Out of them, only 6921 were not present in the training set - so these were the unique AI generate passwords.

Then to check the quality of the said passwords I took a bigger wordlist hk_hlm_founds 38.6 million passwords. It turns out my AI generated wordlist contains 678 passwords that can be found in that wordlist. Then I checked an even bigger wordlist, weakpass_4.latin 2.16 billion passwords. My AI generated wordlist shares 3313 passwords with that list, so I guess at least 48% of my newly generated unique passwords are actually useful which to be honest is way more than I expected, especially since I did not do any kind of hyperparameter tuning and I did not even train it for a single epoch.

So I guess this proves that augmenting wordlists with AI is a reasonable thing to do and will increase the effectiveness of the said wordlists somewhat significantly.

Would be interesting to compare these results with a best rule based approach generated new 10k passwords.

I have also attached the full 10k generated passwords below.


Attached Files
.txt   ai_pass.txt (Size: 86.7 KB / Downloads: 4)
Reply


Messages In This Thread
RE: Using autoregressive character-level language model to augment wordlists? - by Complexoctopus - 11-03-2024, 10:08 PM