How to set UTF-8 charset encoding in .hcchr file?
#1
I need to have a russian alphabet in UTF-8 encoding.
For test I created a file "ru_utf8.hcchr" with two letter "CYRILLIC CAPITAL LETTER A" and "CYRILLIC CAPITAL LETTER BE". Hex content of file: D0 90 D0 91. Size of file: 4 bytes. It means that file is in UTF-8 encoding.
Then I calculated MD5 of this file and save it to "secret.md5". This MD5 is the same as MD5 of string in UTF-8 "АБ" (russian letters of cource).
And then I run hashcat (version 0.47) with follow command line:
Code:
hashcat-cli32 -m 0 secret.md5 -a 3 -1 ru_utf8.hcchr ?1?1
Original string wasn't found, output of hashcat:
Code:
Input.Mode: Mask (?1) [1]
Index.....: 0/1 (segment), 3 (words), 0 (bytes) [b]<-- why 3 words? there are only 2 (russian letters 'A' and 'Be')[/b]
Recovered.: 0/1 hashes, 0/1 salts
Speed/sec.: - plains, - words
Progress..: 3/3 (100.00%)
Running...: --:--:--:--
Estimated.: --:--:--:--

Input.Mode: Mask (?1?1) [2]
Index.....: 0/1 (segment), 9 (words), 0 (bytes)
Recovered.: 0/1 hashes, 0/1 salts
Speed/sec.: - plains, - words
Progress..: 9/9 (100.00%)
Running...: --:--:--:--
Estimated.: --:--:--:--
I further increased the mask, and only if its length is 4, the original word was found.
Code:
hashcat-cli32 -m 0 secret.md5 -a 3 -1 ru_utf8.hcchr ?1?1?1?1
All hashes have been recovered
Input.Mode: Mask (?1?1?1?1) [4]
Index.....: 0/1 (segment), 81 (words), 0 (bytes)
Recovered.: 1/1 hashes, 1/1 salts
Speed/sec.: - plains, - words
Progress..: 79/81 (97.53%)
Running...: 00:00:00:01
Estimated.: --:--:--:--
I understand that the character codes "D0 90" and "D0 91" in .hcchr file combined in 3 bytes: "D0 90 91". But I need that 2 bytes were presented as 1 symbol in UTF-8 encoding. How I can do this?
Solution with the decomposition of characters into 2 bytes are not satisfied, because in this case, there are added a lot of non-existent characters and more extra bruteforce attempts.
#2
I think it's already done what you try to achieve: https://github.com/Rub3nCT/perl-hashcat-utils
#3
Thank you for link, but it is doesn't solved my problem. But it helps me to deeply understand the algorithm of the hashcat.
I downloaded file "Russian.charset" from link above and try to bruteforcing with it. Hashcat says to me:
Code:
Input.Mode: Mask (?1) [1]
Index.....: 0/1 (segment), 67 (words), 0 (bytes)
67 symbols (33 uppercase + 33 lowercase + symbol "â„–")! Wow, I thought, this is what I need! But I was wrong...
This is not 67 symbols. This is 67 different bytes in file. Such coincidence.
This means that for each byte of the searched word, hashcat will bruteforce 67 bytes from custom charset. It is very wasteful when using UTF-8 encoding. How did you solve this problem?
It would be perfect if the hashcat could take one symbol represented by multiple bytes, and inserts it also as multiple bytes in the searched word for computing the hash.
Also I read the topic: https://hashcat.net/forum/thread-2613.html
It's good solution, but it is very difficult to apply for Russian language.