Office international character problem
#1
Some of the non-ascii characters cannot be cracked for office algorithms. Characters like 'ß' in German language cannot be cracked by hashcat but can be cracked by JTR.

How to reproduce:
1- use the following hash which is the hash of a Word2013 file with a single letter password 'ß'
$office$*2013*100000*256*16*23aef9881a73987bc2522ee38a7a4254*163f450664fbf4a40b32edbc75049dd3*5878d4a8ad67b2107b13c087a39b79252bf439e3fae32e465ed3c71ff790397b

2- Hashcat attack:
hashcat -a3 -m9600 doc.hash ?b?b
---> No result!

3- JTR attack
john --mask=?b?b doc.hash
---> Correctly cracked as 'ß' (UTF-8 0xc397)

The problem is valid for all office versions 2003/2007/2010/2013
Reply
#2
(06-30-2019, 06:52 PM)hansel Wrote: Some of the non-ascii characters cannot be cracked for office algorithms. Characters like 'ß' in German language cannot be cracked by hashcat but can be cracked by JTR.

How to reproduce:
1- use the following hash which is the hash of a Word2013 file with a single letter password 'ß'
$office$*2013*100000*256*16*23aef9881a73987bc2522ee38a7a4254*163f450664fbf4a40b32edbc75049dd3*5878d4a8ad67b2107b13c087a39b79252bf439e3fae32e465ed3c71ff790397b

2- Hashcat attack:
hashcat -a3 -m9600 doc.hash ?b?b
---> No result!

3- JTR attack
john --mask=?b?b doc.hash
---> Correctly cracked as 'ß' (UTF-8 0xc397)

The problem is valid for all office versions 2003/2007/2010/2013

use instead
hashcat -a3 -m9600 doc.hash ?b
Reply
#3
(06-30-2019, 08:26 PM)3Pi0 Wrote: use instead
hashcat -a3 -m9600 doc.hash ?b

Sorry I gave the wrong example. You're right about the example I gave in my previous message.

But, there is actually a problem for some characters like 'ş'.
The password of the word document is just 'ş'. The hash is as follows:

$office$*2010*100000*128*16*3dab60c78beaeac33901cb587f2f7c1c*f4edb827473160c0eb9c1d07cb4c2a83*d1372637fd0b2c45d41452c9667086cd7acee09da3858c9a6d12bd70ecd94675

JTR can crack it like I said before using the following command line:
john --mask=?b?b doc.hash
john --wordlist=utf-8.dict doc.hash

Hashcat cannot crack it. None of the single-byte mask, double-byte mask, dictionary utf-8, dictionary iso-8859-9 attacks work. The tried command lines are as follows:

hashcat -a3 -m9500 doc.hash ?b
hashcat -a3 -m9500 doc.hash ?b?b
hashcat -a0 -m9500 doc.hash utf-8.dict
hashcat -a0 -m9500 doc.hash iso-8859-9.dict

None of the above works.

utf-8.dict contains single line with character 'ş' (HEX C59F) on it
iso-8859-9.dict contains single line with character 'ş' (HEX FE) on it
Reply
#4
No one knows the solution?
Is this a bug?
Reply
#5
I'm pretty sure this is a known problem with how the OpenCL kernels use utf16 chars.

As far as I know, the kernel code (also for performance reasons) for utf16le encoding is very "elementary" and just uses zero bytes every second byte. of course this is a problem if the characters are outside the 0x00-0xff range because for instance 0xc5 0x9f is converted to 0xc5009f00 instead of 0xc59f
Reply
#6
phil thanks for your response.

However, I do not understand what utf16 has to do with it. I am not specifying utf16 anywhere. Are you saying that office kernels use utf16 by default? In addition other non-ascii characters like 'ß' (UTF-8 0xc397) are cracked ok. I can find more examples of other non-ascii characters that can be cracked and that cannot be cracked if you wish, if it will make debugging easier.

As you can see in one of the trials I am using BINARY mode and testing all single bytes and double bytes. And it still cannot crack.
Actually I tested all 1-byte, 2-byte, 3-byte and 4-byte binary space and still nothing!
hashcat -a3 -m9500 doc.hash ?b
hashcat -a3 -m9500 doc.hash ?b?b
hashcat -a3 -m9500 doc.hash ?b?b?b
hashcat -a3 -m9500 doc.hash ?b?b?b?b

The only thing that cracks is 5-byte Office2003. And that's because of the 40bit collider issue specific to Office2003. And obviously the result 5-byte is meaningless.
Reply
#7
yep yep yep

that's how the algorithm works (and actually most of the microsoft-invented algos),

also see this perl module to understand the algo: https://github.com/hashcat/hashcat/blob/...600.pm#L26

btw: the perl module of course works "correctly" because the encode ("UTF-16LE", $word) doesn't try to be as fast as possible and respects all encoding rules for utf16le (while as mentioned above the kernel does not).

You can also try to test with something like NTLM and you will experience the same problems...

actually we should add this limitation to docs/limits.txt (can somebody request this on github ?) to mention this "problem" of not doing the utf16 conversion with all the complicated encoding rules on kernel because of performance issues and making the conversion very complex
Reply
#8
(07-18-2019, 12:01 PM)philsmd Wrote: actually we should add this limitation to docs/limits.txt (can somebody request this on github ?)

https://github.com/hashcat/hashcat/issues/2121
Reply
#9
phil thanks for the explanation and pointers. I looked into the opencl kernel code and observed the places where UTF16 conversion takes place.

The first misconception I had was to assume that when I used mask attack (with binary character set) each byte would make into the base algorithm as it is. Now thinking more deeply about it I see this cannot be true. The interface contract of a kernel code with hashcat assumes "characters" and NOT raw bytes into the algorithm. From what I understand hashcat always uses single byte "characters" internally. A kernel that needs UTF16 characters must convert single byte characters supplied to it into 2-byte UTF16 characters. And there is nothing it can do but concatenate zeros as the first byte because it cannot know if the user intended any input character encoding. Such kernel cannot take input characters 2-by-2 as raw bytes of UTF16 characters since then normal attacks using ascii characters would not work.

Actually an interesting case occurs for kernels that use UTF8 characters (like RAR). Since single byte characters are the same for both ASCII and UTF8, such kernels take in the characters as they are. And everything works OK. But this time I can also do multi byte UTF8 attack using my UTF8 dictionaries AND I can also do slightly inefficient mask attacks by constructing special custom charsets for my target language.

The question I have is how JTR manages to crack it. From my tests I found that UTF8 version of the character does the crack. I must guess that JTR does not use single byte characters but uses UTF8 internally and the kernel can do the proper conversions.

I understand the speed reasoning of hashcat about using single byte characters. But, all the speed in the world does not matter if it cannot crack. Most of the popular algorithms used (like office 2013) have so slow hashes that encoding conversion would not be a bottleneck. Not every hash is as fast as an MD5. So I think the developers should have developed an alternate path in the code where by user preference it could use multi byte characters with taking the performance hit. But I also understand that at the current state of the code it is next to impposible.

One suggestion I would have is to add a "raw" version of UTF16 kernels. For example, in addition to 9500 (Office 2010) we can have 9501 (Office 2010 raw/advanced version). But in 9501, the kernel takes all the characters as is and does not try to make UTF16LE conversion. The user must be aware of that he needs to supply UTF16LE. This way I can use the UTF16LE dictionary I have. And also I can do mask attack by making use of a combination of special hex custom charsets I make for my target language. This way at least I can make an attack and the change is so easy as I can tell by observing the code.
Reply
#10
you can try it yourself

create a clean directory with the hashcat download/binaries (e.g. from https://hashcat.net/beta or github or even release should work). make sure that the "kernels/" directory does not exist (it does not exist for sure if hashcat was freshly downloaded). Modify the file OpenCL/m09600-pure.cl and change
sha512_update_global_utf16le_swap to sha512_update_global_swap
https://github.com/hashcat/hashcat/blob/...ure.cl#L50

(after each change of the kernel file the kernels/ folder need to be cleaned)

this should now work if you use a utf16 encoded password or use --encoding-to utf16le parameter
Reply