Problems with non-English hashes
#1
Hello everyone and happy new year!

Currently I'm working on a pretty large list (3.9M hashes, trimmed down to 2.2M unique hashes) made of plain, un-salted MD5 hashes.
I know for sure that all the plains were created from Turkish people.
The problem is... I'm recovering very few of them (just a little less than 200 words).
Honestly, that's pretty a bummer: I'm far away from being a top cracker, but honestly with such a large collection I hoped I could retrieve a lot more of them before starting to have problems.
At the beginning I thought I wrongly detected the hash type or trying to crack something meaningless (ie token reset or something like that), but then some plains started to pop out, so those hashes are indeed passwords.

This is what I tried so far:
  • Plain run of rockyou wordlist + all available stock rules
  • Plain run of hashes.org worlist + all available stock rules
  • Plain run of several different wordlists of previous leaks + all available stock rules
  • Download the Turkish dictionary + all available stock rules
  • Combining Turkish dictionary with wordlists
  • Download list of names and celebrities + all available stock rules
  • Started using masked attack (at the moment processing the rockyou-3 mask file)
  • Replacing default charset with Turkish one (lower and upper chars)
  • Started generating random rules
Sadly, I got very little improvement: only 2-3 passwords every time. However I noticed that there aren't any chars coming from the Turkish alphabet (ğ, ş ...), so my question is: am I doing it right?
Am I missing anything like file encoding or something else?
Maybe the originally chars were not encoded properly?

Any suggestions?
#2
The problem with different scripts is that there are loads of different standards. English is basically an extension of ASCII, but everything beyond that is complicated. Try finding a few more Turkish dictionaries, see what encoding they use, and look at the few you have cracked.

Strange that it's so few though, I'd expect it to be better than that.
#3
(01-01-2016, 07:29 PM)FlippingGerman Wrote: Strange that it's so few though, I'd expect it to be better than that.

That makes two of us Sad
#4
Sorry for the double posting, but I tried some new things.
Looking around, trying to crack anything that it's not English seems to be a big PITA.
To be on the right side, I tried to use the charset as hex encoded.
This means that the Turk alphabet become from this:
Code:
abcçdefgğhıijklmnoöprsştuüvyz
to this:
Code:
616263c3a764656667c49f68c4b1696a6b6c6d6e6fc3b6707273c59f7475c3bc76797a

And I did the same for the uppercase letters:
Code:
ABCÇDEFGĞHIİJKLMNOÖPRSŞTUÜVYZ
Code:
414243c38744454647c49e4849c4b04a4b4c4d4e4fc396505253c59e5455c39c56595a


Then I took the rockyou-1... mask file and replaced all the occurrences of l (lower L) to 1 and u to 2.
Then I run oclHashcat using the following command:

Code:
./cudaHashcat-2.01/cudaHashcat64.bin -a 3 -m 0 -o output.txt --hex-charset -1 turkish_lower.hcchr -2 turkish_upper.hcchr hashes/hashes.txt turkish-1-60.hcmask

And....

nothing happened.
I'm currently running out of ideas...
#5
Have you looked at the ones that have been cracked, and found any patterns that might be doing this, such as everything capitalised or similar?
#6
You're probably using the wrong charset.
#7
Encoding matters.
#8
(01-01-2016, 11:36 PM)undeath Wrote: You're probably using the wrong charset.
That's possible, any suggestions on how could I find the error?

(01-02-2016, 12:05 AM)epixoip Wrote: Encoding matters.
Does it matter even if I'm using an HEX charset and a masked attacked?
Can you please just point me in the right direction, at least to narrow down the source of the problems? Should I take care of the encoding even of the charset file, even if I'm using them as HEX?
Moreover, since it's my first time using them, did I converted it right? ie a single string with all the consecutive chars?
#9
Yes, it absolutely matters when using hex charset, as the same character with two different encodings likely have two different hex values.

You need to figure out which encodings/codepages are used in that locale, and try each one.