hashcat Forum
Sorting utf-8 wordlists - Printable Version

+- hashcat Forum (https://hashcat.net/forum)
+-- Forum: Misc (https://hashcat.net/forum/forum-15.html)
+--- Forum: General Talk (https://hashcat.net/forum/forum-33.html)
+--- Thread: Sorting utf-8 wordlists (/thread-1278.html)



Sorting utf-8 wordlists - fizikalac - 06-11-2012

Hi!

On my Ubuntu VPS server, the locale is set to en_US.utf8, but when I use sort command on a custom language utf-8 character wordlist, all speacial characters like č get converted to c. It looks like a collation issue. What settings do I have to apply for this to work? Do I have to install and change my locale? That would be really bad. I tried to find a solution on Google but without success.

Thanks!


RE: Sorting utf-8 wordlists - undeath - 06-12-2012

how does the sort command you run look like?


RE: Sorting utf-8 wordlists - fizikalac - 06-12-2012

(06-12-2012, 01:16 AM)undeath Wrote: how does the sort command you run look like?

It is the standard unix sort.

I run it like this:

cat wordlist.txt | sort -u > sorted.txt


RE: Sorting utf-8 wordlists - undeath - 06-12-2012

cannot confirm.

Code:
[ undeath@p4home: /tmp ] % ~> cat test
öasdf
Ä‘hg4sb5t56
čwegver
Àsdrvgßsd
Ä‘hg4sb5t56
è weü46zgbe4z
[ undeath@p4home: /tmp ] % ~> sort -u test
Àsdrvgßsd
čwegver
Ä‘hg4sb5t56
è weü46zgbe4z
öasdf
[ undeath@p4home: /tmp ] % ~> echo $LANG,$LC_ALL
de_DE.UTF-8,de_DE.UTF-8



RE: Sorting utf-8 wordlists - fizikalac - 06-13-2012

Strange, I guess it's all about locale... I will post again if I encounter such problems.


RE: Sorting utf-8 wordlists - NeonFlash - 11-15-2012

did you find a solution to this?

can you extract 10 example lines from your wordlist (which contain accents, umlauts, and other utf-8 unicode characters), run the commands as undeath has done and post the output here?

then, we can test the same on our *nix systems Smile


RE: Sorting utf-8 wordlists - epixoip - 11-15-2012

please do not revive dead threads.


RE: Sorting utf-8 wordlists - NeonFlash - 11-15-2012

Just wanted to know the solution and have some discussion around it.

Point noted, thank you.