Unicode == FUN
#1
Hello, I am trying to get a better understanding of how to crack passwords that contain unicode characters. I am fairly new to attacking non-english passwords outside of dictionary attacks, so my apologies if i'm missing something obvious.

This topic is interesting to me in general, but the specific passwords i'm testing with right now are UTF-8 encoded characters, some of which take up to 4 bytes per character.

For a mask attack, what is the best way to address a situation like this?
#2
See the ./charsets/ subdirectory for character sets that you can use. If you don't know what the target languages might be, apply some wordlists first to determine language frequency, and then target the most common languages first.

For bruteforce/masks and multibyte characters, this may be informative:

http://blog.bitcrack.net/2013/09/crackin...guage.html
~
#3
Ah, I was reading too quickly - if you're especially interested in multibyte, the stock charsets won't help, but Rurapenthe's post about bruteforcing should still be helpful. The essential insight is that hashcat treats the words as sequences of bits, so if your source dictionaries are UTF-8, hybrid and combinator attacks should be mostly straightforward. If your dictionaries are encoded in something other than UTF-8, you'll need to convert them -- which can be tricky. I can dig up some notes on that if you need them.
~
#4
For mask attack, see Rurapenthe's blog above ^

For everything else, iconv is your friend.
#5
Huh - yeah, I went back through my notes/scripts and all the iconv items were just -f [source-format] -t [dest-format]. I thought it was harder than that, but I guess not!
~
#6
I did find that blog in my searches before asking, and that method did work, but it got ugly pretty quick for 4-byte characters. Basically it ended up using all four of the custom charsets just to define a single unicode char range, which left me none to use for other purposes.

For example if there were passwords that might contain mathematical alphanumeric symbols (codepoints 1D400-1D7FF), in order to capture those ranges, we need to do something like the following:

hashcat -a 3 -m 0 crackme --hex-charset -1 f0 -2 9d -3 909192939495969798999a9b9c9d9e9f -4 808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebf ?1?2?3?4?1?2?3?4?1?2?3?4?1?2?3?4?1?2?3?4?1?2?3?4 -i

This is fine if the passwords are solely made up of characters from this set, but if they may also contain more typical ascii range characters then we're out of luck.

Overall, it would be much better/cleaner to have a way to reference unicode ranges more generally, kind of like hcchr files but with multi-byte values (let's say hcchrmb files). In that way you could reference ranges and encodings more freely such as:

hashcat -a 3 -m 0 crackme --hex-charset -1 charsets/unicode/3040_hiragana_utf8.hcchrmb -2 charsets/unicode/30a0_katakana_utf16le.hcchrmb -3 ?l?u?d?1?2 ?u?3?3?3?3?d?d -i

The above would provide a fairly intuitive and powerful way to crack against different ranges and encoding schemes that can't easily be achieved now.

Just a thought, anyway.
#7
Bumping my own thread that no one seemed that interested in the first time...

I don't have a sense of how much work it would be to add something like what i've outlined above with "hcchrmb" files, but if I were to code this up (and I'm not committing to that right now! Smile), would this even be something people were interested in?

atom, others? Would such a patch even be accepted, or is this not the right way to solve this problem?
#8
It depends on how it's implemented. Also note that I've recently added iconv support to the latest beta which could help you loading the .hcchrmb file.
#9
(04-22-2017, 10:11 AM)atom Wrote: It depends on how it's implemented. Also note that I've recently added iconv support to the latest beta which could help you loading the .hcchrmb file.

Any thoughts on an approach that would make sense?
#10
Only a fast one would makes sense.