How to filter duplicate content in large dictionary files and multiple dictionaries?
#5
I recommend using rling:

https://github.com/Cynosureprime/rling

It is very fast.

From the examples section in the repo:

There are many common, and not so common uses for rling.
rling big-file.txt new-file.txt /path/to/old-file.txt /path/to/others/*
This will read in big-file.txt, remove any duplicate lines, then check /path/to/old-file.txt and all files matching /path/to/others/*. Any line that is found in these files that also exists in big-file.txt will be removed. Once all files are processed, new-file.txt is written with the lines that don't match. This is a great way to remove lines from a new dictionary file, if already have them in your existing dictionary lists.
Reply


Messages In This Thread
RE: How to filter duplicate content in large dictionary files and multiple dictionaries? - by b8vr - 09-20-2024, 09:27 PM