[solved] Problem sorting dictionary file
#1
I'm trying to sort a huge (2gb) dictionary file.

I tried the hashcat tools, but had zero result.
With gnuwin utils, I tried sort.exe, gsort.exe, cat.exe

Code:
cat 20_04found.dic | sort | uniq > 20_04founduniq.dic
sort 20_04found.txt | uniq > 20_04foundu.txt
gsort 20_04found.txt | uniq -u >20_04foundu.txt

They all stopped after 163mb from a 2GB file.
I changed the file encoding to utf8, to ascii,and windows ...
but it didn't help.

How can I prevent the sorting process to stop at certain characters?
Or is there a good alternative working on windows 64bit

Thank you for replying
#2
it sounds like you are hitting and EOF character but i could be wrong.
#3
how much free disk space do you have?
#4
First of all, thank you for your help!

Code:
it sounds like you are hitting and EOF character
If this is the reason, how can I remove these 'End Of File' characters or keyboard characters when they are not read? How can I recognise them?

Code:
how much free disk space do you have?
I have 16Gb Ram, but a primary drive (C) of only 128Gb (SD) with 22Gb free space and a second drive of 1Tb split in two (200=D with 130Gb free, 800=E with 250Gb free space)
#5
Somebody usually suggests ULM at this point.
http://unifiedlm.com/
#6
Thanks for solving my problem "Kgx Pnqvhm".

ULM could handle my 2gb file.
I downloaded the latest version of sort64.exe (15.2.2014) from the CLi-package on unifiedlm, which handled the task ....
It also managed to remove the duplicates

Code:
Sort64.exe -i file_in -u -o file_out

Finaly a nice cleaned dic.

Thank you
#7
(04-21-2014, 02:42 AM)tibit Wrote: Finaly a nice cleaned dic.

Thank you

you might want to re word that lol