500M hashes
#1
Hi,

On the page https://haveibeenpwned.com/Passwords one can download the Pwned Passwords list, which is a list of 500 millions SHA-1 hashed passwords.
Hashcat tell me "Insuficient memory" when I load this hashlist. I guess 500M is quite a lot.

The only solution is to split this list into chunks?

Thanks.
#2
Yes. If your current platform isn't big enough to hold them all, there's no choice other than to chunk them. Just splitting it in half or thirds might be enough. When splitting it, be sure to use split's "l/N" notation ("split into N files without splitting lines/record") so that you don't split in the middle of a hash.

There is a high startup cost when cracking such large lists, so if your first set of attacks can quickly reduce the overall size, your future attacks will be faster to start up and more efficient to run.

You'll probably want to use --remove, which will remove hashes from the file when cracks are found (obviously you'd be working from split copies at that point, not the original - retain that). Once you get the hashlist size down a bit, you can then also save that first large potfile off out of the way (so that you don't have to wait for it to be parsed each time).

Normally, a fast attack using a small ruleset (like best64) and wordlists like rockyou or hashes.org founds would probably do a good job of culling a lot of it up front for you.

But in the case of the Pwned Passwords v2 list specifically, be prepared for most of your usual cracking techniques to simply not work after a certain point - because, Troy's description of the list ("Each password is stored as a SHA-1 hash of a UTF-8 encoded password") is ... inaccurate. There are *millions* of passwords in there that have non-UTF binary data, HTML encodings, null bytes, non-UTF8 character sets, trailing newlines, etc. A lot of it is utter crap that has no practical value for use as a "is this password in a leak somewhere" dataset. You can use this information to crack more of the hashes, but the end results make for a pretty crappy wordlist (unless you're trying to crack another crappy hashlist Wink )

Also be aware that HIBPv2 has already been largely cracked - 99.06% as of this writing, and the found and left lists can be downloaded from the "leaks" list at hashes.org (https://hashes.org/leaks.php). The vast majority of actually useful passwords are probably already in the found list there ... but there are probably still a few interesting ones!
~
#3
HI,
Thank you for this detailed explanation.
The goal was to test/benchmark few wordlists and rules against a known database.
When you say "your current platform isn't big enough to hold them all" do you mean that the RAM of my MB or GPUs is not enough?
#4
GPU memory is probably the first bottleneck that you'll hit when processing a very large hashlist.
~