How to efficiently manage huge (>100 GB) wordlists?
#1
Heya,

For my special use case brute-force doesn't work as good as a wordlist. My list is dozens of GB already and every now and then I add new lists of a few gigs to the old list and do a simple "sort -u oldlist.txt > newlist.txt" to remove the duplicates.

Hashcat works great with such big lists, but managing the list (adding new entries without storing all the duplicates) is a pain and takes a lot of time.

Are there some best practices to manage wordlists of this size? Maybe using a NoSQL-DB like LevelDB?
#2
ULM may help: http://unifiedlm.com/Home

Also, someone on the hashkiller forum is working on something more powerful:
http://forum.hashkiller.co.uk/topic-view...7742#37742
#3
ULM isn't suitable for such big collections.

Adding new entries to your dict however can be done way faster:

sort newdict -o newdict && sort -m -u olddict newdict -o mergeddict
#4
How does the MST (Multiple Sort-Tools) on SmallUtilities.org (related to Hashes.org) compare?