Thread Closed 
 
Thread Rating:
  • 0 Votes - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Sorting utf-8 wordlists
06-11-2012, 02:08 PM (This post was last modified: 06-11-2012 02:10 PM by fizikalac.)
Post: #1
Sorting utf-8 wordlists
Hi!

On my Ubuntu VPS server, the locale is set to en_US.utf8, but when I use sort command on a custom language utf-8 character wordlist, all speacial characters like č get converted to c. It looks like a collation issue. What settings do I have to apply for this to work? Do I have to install and change my locale? That would be really bad. I tried to find a solution on Google but without success.

Thanks!
Find all posts by this user
06-12-2012, 01:16 AM
Post: #2
RE: Sorting utf-8 wordlists
how does the sort command you run look like?
Find all posts by this user
06-12-2012, 01:18 AM
Post: #3
RE: Sorting utf-8 wordlists
(06-12-2012 01:16 AM)undeath Wrote:  how does the sort command you run look like?

It is the standard unix sort.

I run it like this:

cat wordlist.txt | sort -u > sorted.txt
Find all posts by this user
06-12-2012, 01:25 AM
Post: #4
RE: Sorting utf-8 wordlists
cannot confirm.

Code:
[ undeath@p4home: /tmp ] % ~> cat test
öasdf
đhg4sb5t56
čwegver
Àsdrvgßsd
đhg4sb5t56
è weü46zgbe4z
[ undeath@p4home: /tmp ] % ~> sort -u test
Àsdrvgßsd
čwegver
đhg4sb5t56
è weü46zgbe4z
öasdf
[ undeath@p4home: /tmp ] % ~> echo $LANG,$LC_ALL
de_DE.UTF-8,de_DE.UTF-8
Find all posts by this user
06-13-2012, 07:32 PM
Post: #5
RE: Sorting utf-8 wordlists
Strange, I guess it's all about locale... I will post again if I encounter such problems.
Find all posts by this user
11-15-2012, 09:59 AM
Post: #6
RE: Sorting utf-8 wordlists
did you find a solution to this?

can you extract 10 example lines from your wordlist (which contain accents, umlauts, and other utf-8 unicode characters), run the commands as undeath has done and post the output here?

then, we can test the same on our *nix systems Smile
Find all posts by this user
11-15-2012, 03:37 PM
Post: #7
RE: Sorting utf-8 wordlists
please do not revive dead threads.
Visit this user's website Find all posts by this user
11-15-2012, 04:59 PM
Post: #8
RE: Sorting utf-8 wordlists
Just wanted to know the solution and have some discussion around it.

Point noted, thank you.
Find all posts by this user
Thread Closed