Trying to understand RLI and RLI2 better
#1
Been reading and doing a little testing with RLI and have the following that I'm trying to understand:

I have wordlist1:
word1
word2
word3
word4

And wordlist2:
camp1
word1
word2
word3
word5

When I run this command:
./rli wordlist1 wordlist-rli wordlist2

This is what's contained in wordlist-rli
word4

What I was hoping for is this (in no particular order):
camp1
word1
word2
word3
word4
word5


Long story short is I have wordlists that I would like to merge, but also make sure they are unique.. perhaps there's a better way than using RLI
Reply
#2
The purpose of rli is to diff two lists, and only show the new ones in the new file. It's not a dedupe tool.

For general dedupe, sort -u is your go-to for this. I use this alias (adjust parameters to your hardware):

Code:
LC_ALL=C sort --parallel=4 -S 4000M -T /storage-hdd/tmp/ -u

What people usually do is sort -u both lists that will be the input to rli (or rather, rli2, which assumes sorted and uniq'd lists).

If your original inputs are in frequency order, you can use rli, but they should be at least deduped first.
Reply