Office international character problem
#9
phil thanks for the explanation and pointers. I looked into the opencl kernel code and observed the places where UTF16 conversion takes place.

The first misconception I had was to assume that when I used mask attack (with binary character set) each byte would make into the base algorithm as it is. Now thinking more deeply about it I see this cannot be true. The interface contract of a kernel code with hashcat assumes "characters" and NOT raw bytes into the algorithm. From what I understand hashcat always uses single byte "characters" internally. A kernel that needs UTF16 characters must convert single byte characters supplied to it into 2-byte UTF16 characters. And there is nothing it can do but concatenate zeros as the first byte because it cannot know if the user intended any input character encoding. Such kernel cannot take input characters 2-by-2 as raw bytes of UTF16 characters since then normal attacks using ascii characters would not work.

Actually an interesting case occurs for kernels that use UTF8 characters (like RAR). Since single byte characters are the same for both ASCII and UTF8, such kernels take in the characters as they are. And everything works OK. But this time I can also do multi byte UTF8 attack using my UTF8 dictionaries AND I can also do slightly inefficient mask attacks by constructing special custom charsets for my target language.

The question I have is how JTR manages to crack it. From my tests I found that UTF8 version of the character does the crack. I must guess that JTR does not use single byte characters but uses UTF8 internally and the kernel can do the proper conversions.

I understand the speed reasoning of hashcat about using single byte characters. But, all the speed in the world does not matter if it cannot crack. Most of the popular algorithms used (like office 2013) have so slow hashes that encoding conversion would not be a bottleneck. Not every hash is as fast as an MD5. So I think the developers should have developed an alternate path in the code where by user preference it could use multi byte characters with taking the performance hit. But I also understand that at the current state of the code it is next to impposible.

One suggestion I would have is to add a "raw" version of UTF16 kernels. For example, in addition to 9500 (Office 2010) we can have 9501 (Office 2010 raw/advanced version). But in 9501, the kernel takes all the characters as is and does not try to make UTF16LE conversion. The user must be aware of that he needs to supply UTF16LE. This way I can use the UTF16LE dictionary I have. And also I can do mask attack by making use of a combination of special hex custom charsets I make for my target language. This way at least I can make an attack and the change is so easy as I can tell by observing the code.
Reply


Messages In This Thread
RE: Office international character problem - by hansel - 07-21-2019, 09:09 PM