07-10-2020, 08:06 AM
Thanks for replying and answering my question. I appreciate your help
Yes my purpose is de-duplication and then splitting, but even with 4 splits it's still slow, so I need to make more chunks as you said.
I wasn't aware of rli tool, I used to use sort command with -u -m parameters to remove duplicates. So rli will be more efficient because I don't need to rewrite the whole file as sort command does, right?
Regarding the cache file, dictstat2, if I use the same machine (same hardware always) will be fine?
I think you mean that you are worried about the caching mechanism, that maybe it's affected by hardware type?
Thanks again!
Yes my purpose is de-duplication and then splitting, but even with 4 splits it's still slow, so I need to make more chunks as you said.
I wasn't aware of rli tool, I used to use sort command with -u -m parameters to remove duplicates. So rli will be more efficient because I don't need to rewrite the whole file as sort command does, right?
Regarding the cache file, dictstat2, if I use the same machine (same hardware always) will be fine?
I think you mean that you are worried about the caching mechanism, that maybe it's affected by hardware type?
Thanks again!