Polish letters - office
#1
I was recently checking how Hashcat is working with encrypted Office files.

I have generated two docx files (one with Office 2010, and second 2016) with encryption and simple password - ąąą - three polish special letters represented in unicode as U+0105.

What I've tried so far:
1. Brutforce by default settings -  Exhausted
2. Bruteforce with charset and mask ?1?1?1
2.1 Tried evry polish charset in HC packet - Exhausted
2.2 Generated .txt (saved as .hcchr) files using notepad++ with every possible encoding (ANSI, UTF-8 wo BOM, UTF-8, UCS-2 Big Endian; USC-2 Little Endian) -Exhausted
2.3 Generated .txt (then saved as .hcchr in N++) using WORD with specified encoding (unicode, UTF-8, UTF-7) - Exhausted


My command  

hashcat64.exe -a 3 -m 9500 -w 3  $office$*2010*100000*128*16*[..] --powertune-enable -o found-[..].txt -4 C:\[..]\hashcat-3.20\hashcat-3.20\charsets\<<charsetfile>> ?4?4?4


I cannot figure out what im missing. Any ideas? 

Thanks in advance.
#2
No idea, mate, but v3.30 was released over a month ago. You should at least try that first.
#3
Maybe the confusion here is just that you are thinking that this string consists of 3 bytes, while it (probably) is at least 6 bytes long:
See here (hexdump):
Code:
echo -n ąąą | xxd -g 1
0000000: c4 85 c4 85 c4 85

So a mask of length 3 is too short for a password of length 6, as simple as that.

(note: not sure if the bytes mentioned above are correct, that depends on the encoding!)
#4
Thank you for replays
So, I've updated HC (somehow missed new version) and re-run all tests on 3.3 with mask ?4?4?4?4?4?4 (this seems to be a good point) and get - nothing.  I also do test with charsets provided with hashcat – nothing. 

I attached to this post office file (o10a.docx office 2010 - test file) so somebody can check it by yourself. Pasword is - ąąą.


Attached Files
.docx   o10a.docx (Size: 18.5 KB / Downloads: 2)
#5
After trying your example, I came to the conclusion that you might have hit exactly one limitation of hashcat, i.e. that it doesn't perform a "perfectly correct" utf16 conversion of the password candidate but instead (for performance reasons) it sets the 2nd byte always to 0.

This is actually a known limitation and of course only affects the hash types which use utf16 within the algorithm.

I think there is currently no open issue on github (please double-check) about this limitation. Feel free to request this new feature (but of course, we need to think about its impact on the performance)