Compress file size
#1
Good afternoon. When creating a list with possible passwords, 2 questions arose. There is a file in the extension .txt file size is about 1 TB, the file contains 15 billion lines. Each line is a hash of 64 characters [a-f,1-9], it is also a possible password option.
1. Is it possible to compress the file size somehow?
2. By what means can I check duplicates in the file?
Reply
#2
Compress into archive with 7z/zip/rar. Then the volume will decrease. But it is not a usable file then.

Other option is to keep this file unchanged in gzip format, where you will have compressed file but usable through streams in Linux.

To remove duplicates you would need to sort the file and then uniq it. There are Linux commands "sort" and "uniq". Uniq works only if two corresponding lines are the same, that's why the file should be sorted.
Reply