Posts: 21
	Threads: 5
	Joined: Jun 2012
	
	
 
	
		
		
		06-11-2012, 02:08 PM 
(This post was last modified: 06-11-2012, 02:10 PM by fizikalac.)
		
	 
	
		Hi!
On my Ubuntu VPS server, the locale is set to en_US.utf8, but when I use sort command on a custom language utf-8 character wordlist, all speacial characters like Ä get converted to c. It looks like a collation issue. What settings do I have to apply for this to work? Do I have to install and change my locale? That would be really bad. I tried to find a solution on Google but without success.
Thanks!
	
	
	
	
	
 
 
	
	
	
		
	Posts: 2,301
	Threads: 11
	Joined: Jul 2010
	
	
 
	
	
		how does the sort command you run look like?
	
	
	
	
	
 
 
	
	
	
		
	Posts: 21
	Threads: 5
	Joined: Jun 2012
	
	
 
	
	
		 (06-12-2012, 01:16 AM)undeath Wrote:  how does the sort command you run look like?
It is the standard unix sort.
I run it like this:
cat wordlist.txt | sort -u > sorted.txt
	
 
	
	
	
	
 
 
	
	
	
		
	Posts: 2,301
	Threads: 11
	Joined: Jul 2010
	
	
 
	
	
		cannot confirm.
Code:
[ undeath@p4home: /tmp ] % ~> cat test 
öasdf
đhg4sb5t56
Äwegver
Àsdrvgßsd
đhg4sb5t56
è weü46zgbe4z
[ undeath@p4home: /tmp ] % ~> sort -u test 
Àsdrvgßsd
Äwegver
đhg4sb5t56
è weü46zgbe4z
öasdf
[ undeath@p4home: /tmp ] % ~> echo $LANG,$LC_ALL 
de_DE.UTF-8,de_DE.UTF-8
 
	 
	
	
	
	
 
 
	
	
	
		
	Posts: 21
	Threads: 5
	Joined: Jun 2012
	
	
 
	
	
		Strange, I guess it's all about locale... I will post again if I encounter such problems.
	
	
	
	
	
 
 
	
	
	
		
	Posts: 82
	Threads: 26
	Joined: Oct 2011
	
	
 
	
	
		did you find a solution to this?
can you extract 10 example lines from your wordlist (which contain accents, umlauts, and other utf-8 unicode characters), run the commands as undeath has done and post the output here?
then, we can test the same on our *nix systems 
	 
	
	
	
	
 
 
	
	
	
		
	Posts: 2,935
	Threads: 12
	Joined: May 2012
	
	
 
	
	
		please do not revive dead threads.
	
	
	
	
	
 
 
	
	
	
		
	Posts: 82
	Threads: 26
	Joined: Oct 2011
	
	
 
	
	
		Just wanted to know the solution and have some discussion around it.
Point noted, thank you.