03-29-2013, 03:54 PM
Are you sure about the format of the hash.txt list.
It seems to be number[8 digit]:hash[md5 32byte]
Therefore, a command like:
cut -b 8- hashes.txt|sort -u|wc -l
gives a MUCH smaller number than your 551638... so the hashes are *not* uniq (only the lines, since "numbered")
Furthermore, the answer is already in your question: *426429 unique digests*
Ps. it would be better to split the two parts w/ cut -d ":" -f 2
and test also cut .....|uniq -d which gives a lot of duplicates.
BTW (the output of the split):
$ cut -d: -f2 hashes.txt|sort -u|wc -l
426429
Hope this solves your problem
It seems to be number[8 digit]:hash[md5 32byte]
Therefore, a command like:
cut -b 8- hashes.txt|sort -u|wc -l
gives a MUCH smaller number than your 551638... so the hashes are *not* uniq (only the lines, since "numbered")
Furthermore, the answer is already in your question: *426429 unique digests*
Ps. it would be better to split the two parts w/ cut -d ":" -f 2
and test also cut .....|uniq -d which gives a lot of duplicates.
BTW (the output of the split):
$ cut -d: -f2 hashes.txt|sort -u|wc -l
426429
Hope this solves your problem