Posts: 2
Threads: 1
Joined: Sep 2024
The large dictionary I downloaded online is about 150GB. How can I filter out my existing dictionary files? My computer has 80GB of memory, and when using rli.exe files that exceed 20GB, it reports insufficient memory. Is there any other way to filter out duplicate content in other files!
The operating system is Windows
In addition, there is a single file of 150GB. Is there any way to filter out duplicate content in a single file?
8
Posts: 889
Threads: 15
Joined: Sep 2017
(09-18-2024, 10:26 AM)mima8cn Wrote: The large dictionary I downloaded online is about 150GB. How can I filter out my existing dictionary files? My computer has 80GB of memory, and when using rli.exe files that exceed 20GB, it reports insufficient memory. Is there any other way to filter out duplicate content in other files!
The operating system is Windows
In addition, there is a single file of 150GB. Is there any way to filter out duplicate content in a single file?
depending on the amount of passwords / size of your dictionaries and the attacked hashtype i would assume to just leave it this way
when attacking fast hashes like NTLM or MD5 small dictionaries will be processed almost instantly, therefore filtering out your dictionary files would take more time then just hashing them again
anyway, you could utilize the windows subshell for linux and tools like sort and comm but for this, but you need to sort your input beforehand, so this will also take some time to prepare all of your input files, not quite sure whether sort can handle files that big or not
jfyi
big.txt (after sort)
Code:
1
10
2
3
4
5
6
7
8
9
smalltxt
Code:
comm -23 big.txt small.txt > unig-big.txt
would result in uniq lines big.txt minus small.txt
Posts: 2
Threads: 1
Joined: Sep 2024
(09-18-2024, 05:15 PM)Snoopy Wrote: (09-18-2024, 10:26 AM)mima8cn Wrote: The large dictionary I downloaded online is about 150GB. How can I filter out my existing dictionary files? My computer has 80GB of memory, and when using rli.exe files that exceed 20GB, it reports insufficient memory. Is there any other way to filter out duplicate content in other files!
The operating system is Windows
In addition, there is a single file of 150GB. Is there any way to filter out duplicate content in a single file?
depending on the amount of passwords / size of your dictionaries and the attacked hashtype i would assume to just leave it this way
when attacking fast hashes like NTLM or MD5 small dictionaries will be processed almost instantly, therefore filtering out your dictionary files would take more time then just hashing them again
anyway, you could utilize the windows subshell for linux and tools like sort and comm but for this, but you need to sort your input beforehand, so this will also take some time to prepare all of your input files, not quite sure whether sort can handle files that big or not
jfyi
big.txt (after sort)
Code:
1
10
2
3
4
5
6
7
8
9
smalltxt
Code:
comm -23 big.txt small.txt > unig-big.txt
would result in uniq lines big.txt minus small.txt
This command is for Linux systems, and my system is Windows. I want to know how Windows can filter out duplicate content and files
8
Posts: 889
Threads: 15
Joined: Sep 2017
09-19-2024, 01:32 PM
(This post was last modified: 09-19-2024, 01:35 PM by Snoopy.)
thats why i said WSL -> Windows Subsystem for Linux, its a software designed by Microsoft for running a linux distribution seamless integrated in Windows 10 and 11.
On newer Windows 10/11 you can install it via powershell, just run
./wsl --install
seeĀ
https://learn.microsoft.com/de-de/windows/wsl/install for more informations
Windows don''t have such build-in tools, you have to use a programming language like python or third party programs, but i don't know a programm for your problem
Posts: 119
Threads: 1
Joined: Apr 2022
I recommend using rling:
https://github.com/Cynosureprime/rling
It is very fast.
From the examples section in the repo:
There are many common, and not so common uses for rling.
rling big-file.txt new-file.txt /path/to/old-file.txt /path/to/others/*
This will read in big-file.txt, remove any duplicate lines, then check /path/to/old-file.txt and all files matching /path/to/others/*. Any line that is found in these files that also exists in big-file.txt will be removed. Once all files are processed, new-file.txt is written with the lines that don't match. This is a great way to remove lines from a new dictionary file, if already have them in your existing dictionary lists.