How to filter duplicate content in large dictionary files and multiple dictionaries?
#3
(09-18-2024, 05:15 PM)Snoopy Wrote:
(09-18-2024, 10:26 AM)mima8cn Wrote: The large dictionary I downloaded online is about 150GB. How can I filter out my existing dictionary files? My computer has 80GB of memory, and when using rli.exe files that exceed 20GB, it reports insufficient memory. Is there any other way to filter out duplicate content in other files!
The operating system is Windows

In addition, there is a single file of 150GB. Is there any way to filter out duplicate content in a single file?

depending on the amount of passwords / size of your dictionaries and the attacked hashtype i would assume to just leave it this way

when attacking fast hashes like NTLM or MD5 small dictionaries will be processed almost instantly, therefore filtering out your dictionary files would take more time then just hashing them again

anyway, you could utilize the windows subshell for linux and tools like sort and comm but for this, but you need to sort your input beforehand, so this will also take some time to prepare all of your input files, not quite sure whether sort can handle files that big or not

jfyi

big.txt (after sort)
Code:
1
10
2
3
4
5
6
7
8
9

smalltxt
Code:
3
5
7

Code:
comm -23 big.txt small.txt > unig-big.txt

would result in uniq lines big.txt minus small.txt
Code:
1
10
2
4
6
8
9

This command is for Linux systems, and my system is Windows. I want to know how Windows can filter out duplicate content and files
8
Reply


Messages In This Thread
RE: How to filter duplicate content in large dictionary files and multiple dictionaries? - by mima8cn - 09-19-2024, 05:12 AM