Colliding password protected MS office 97-2003 documents
#1
I recently worked on adding support to oclHashcat in order to crack the different versions of password protected MS Office documents. So far I've finished MS Office 2013, 2010, 2007 and the 97-2003 versions (there are different ones). The next version of oclHashcat (v1.31) will ship with support for all these versions.

While I was working on the 97-2003 version I found out that there's a weakness in the scheme that I want to share. I'm not sure if this technique is already known, but I've never seen a cracker that does what is neccessary to exploit it. OTH, I don't know all of them. Also, as far as I understood it, this attack is different to the flaw discovered back in 2005, when Office is using the same keystream to encrypt two different versions / revisions of the same document, but don't nail me on that!

The following explanations describe the vulnerability to a meet-in-the-middle attack. If you don't know what I mean with meet-in-the-middle attack, see here: https://hashcat.net/events/p13/js-ocohaaaa.pdf

With a single hd7970 I'm able to collide every password protected MS Office 97-2003 documented using MD5/RC4 in less than 6 hours! And yes, you can open the document "normally" with a collided password. In order to explain it you first need to understand the KDF MS office uses.

KDF
  1. Generate 16 byte random salt
  2. Calculate MD5 of unicode version of the password
  3. Truncate 16 byte result to 5 byte
  4. Generate a string of length 336 byte by repeating the string "$digest$salt" 16 times -- (16 * (5 + 16)) = 336
  5. MD5 the 336 bytes
  6. Truncate 16 byte result to 5 byte
  7. Append 4 byte zeros to result
  8. MD5 the 9 bytes
  9. Use 16 byte result as 128 bit RC4 Key
  10. Decrypt encryptedVerifier with RC4 to decryptedVerifier
  11. Decrypt encryptedVerifierHash with RC4 to decryptedVerifierHash
  12. MD5 the decrypted encryptedVerifier
  13. Compare 16 byte result with decrypted encryptedVerifierHash

OK, what is this encryptedVerifier and encryptedVerifierHash? From oclHashcat's view it's just two blocks of random data of length 16 bytes. Why is MS office doing this? The answer is it enables a key validation. The idea is the following: generate 16 bytes of random data, then run it through MD5. Then store both, the original random data and the MD5 result after both were encrypted with RC4. Now, if one has the correct RC4 Key, he can decrypt back to the original random data and MD5 it. If the result matches the decryptedVerifierHash data we know we use the correct RC4 Key. Office will work the same way.

MS original description: http://msdn.microsoft.com/en-us/library/...12%29.aspx

Problem

The problem here is the truncation of the MD5 digest in step 6. OK, the key for the RC4 is 128 bit, but it is calculated out of a MD5 which uses just a 40 bit input (because the last 32 bit are just zeros).

Exploitation

The idea is to iterate through those 2^40 combinations, beginning from step 8. Once we find the correct RC4 Key, which is the case when step 13 is true, we do not need to do those steps ever again. From now on, in oclHashcat, we will just calculate steps 1-5 and then compare the first 5 byte with our pre-cracked intermediate hash. That's the meet-in-the-middle attack.

To make it more comfortable, this technique requires two seperate oclHashcat modes. The first to brute-force the intermediate hash and once you cracked it just store it. Then you can use oclHashcat "normally" with the second mode. This mode works with the previosly cracked key and therefore only needs to calculates steps 1-5. Note that in this second mode you are not stick to Brute-Force, you can use any attack-mode you want. Actually you will not need to do that unless you try to find the real password that was used (forensic stuff). Just continue reading...

So I made some real life tests how the cracking performance changes. One need to know RC4 isn't very GPU friendly. If we can avoid to compute it we will speed up the performance alot. The current speed, on a single hd7970, to crack password protected MS office 97-2003 documents is ~65 MH/s. That's still faster than any other cracker but if we exploit the weakness the numbers changes as following (on a single hd7970):
  • The first mode to pre-crack 2^40 combinations with a speed of ~80MH/s
  • The first mode is guaranteed to succeed and will take a maximum of 4 hours (2 hours avg)
  • The second mode to calculate steps 1-5. I tried it with a hand-hacked kernel, as a form of a prototype, it will do ~ 390MH/s

So we can speed up from 66MH/s to 390MH/s on a single hd7970. Not too bad, eh?

Colliding

With 390MH MH/s, all we need to find is a MD5 digest where the first 5 byte matches the precomputed intermediate hash. Problem is that in theory even by iterating through 2^40 combinations it's not guaranteed to find at least one matching one. But I believe there's still a good chance as it's only the first 5 byte we need to collide. After that, the collision will work as a valid password in MS Office.

In the following experiment below I found my first collision after 14 seconds :-)



Here's the log from my experiment that I made to ensure all this theoretical stuff is really working:

Note: If you're wondering what this code section is about, ignore it. It's just info for myself if I want to reproduce it

First mode, precompute intermediate hash

Code:
// kernel hacked at position step 5, check final hash to get out the intermediate hash
// comment out everything up to here
// use ?b?b?b?b?b mask to simulate countercracker
// --outfile-format 5
// --markov-disable (easier to debug)
// remove unicode in host

// w0_t[0]  = digest[0];
// w0_t[1]  = digest[1] & 0xff;
w0_t[0]  = w0[0];
w0_t[1]  = w0[1] & 0xff;

Quote:
root@et:~/oclHashcat-1.31# ./oclHashcat64.bin -m 9700 hash -a 3 ?b?b?b?b?b -w 3 --potfile-disable
oclHashcat v1.31 starting...

Device #1: Tahiti, 2967MB, 1000Mhz, 32MCU
Device #2: Tahiti, 2967MB, 1000Mhz, 32MCU
Device #3: Tahiti, 2967MB, 1000Mhz, 32MCU

Hashes: 1 hashes; 1 unique digests, 1 unique salts
Bitmaps: 8 bits, 256 entries, 0x000000ff mask, 1024 bytes
Applicable Optimizers:
* Zero-Byte
* Precompute-Init
* Not-Iterated
* Single-Hash
* Single-Salt
* Brute-Force
Watchdog: Temperature abort trigger set to 90c
Watchdog: Temperature retain trigger set to 80c
Device #1: Kernel ./amd/m9700_a3.cl (31816 bytes)
Device #1: Kernel ./amd/markov_le_v1.cl (10294 bytes)
Device #1: Kernel ./amd/bzero.cl (887 bytes)
Device #2: Kernel ./amd/m9700_a3.cl (31816 bytes)
Device #2: Kernel ./amd/markov_le_v1.cl (10294 bytes)
Device #2: Kernel ./amd/bzero.cl (887 bytes)
Device #3: Kernel ./amd/m9700_a3.cl (31816 bytes)
Device #3: Kernel ./amd/markov_le_v1.cl (10294 bytes)
Device #3: Kernel ./amd/bzero.cl (887 bytes)

$oldoffice$1*d6aabb63363188b9b73a88efb9c9152e*afbbb9254764273f8f4fad9a5d82981f*6f09fd2eafc4ade522b5f2bee0eaf66d:$HEX[f2ab1219ae]

Session.Name...: oclHashcat
Status.........: Cracked
Input.Mode.....: Mask (?b?b?b?b?b) [5]
Hash.Target....: $oldoffice$1*d6aabb63363188b9b73a88efb9c9152e*afbbb9254764273f8f4fad9a5d82981f*6f09fd2eafc4ade522b5f2bee0eaf66d
Hash.Type......: MS Office <= 2003 MD5 + RC4, oldoffice$0, oldoffice$1
Time.Started...: Mon Sep 8 19:44:59 2014 (52 mins, 9 secs)
Speed.GPU.#1...: 79666.8 kH/s
Speed.GPU.#2...: 79675.2 kH/s
Speed.GPU.#3...: 79660.5 kH/s
Speed.GPU.#*...: 239.0 MH/s
Recovered......: 1/1 (100.00%) Digests, 1/1 (100.00%) Salts
Progress.......: 747777294336/1099511627776 (68.01%)
Skipped........: 0/747777294336 (0.00%)
Rejected.......: 0/747777294336 (0.00%)
HWMon.GPU.#1...: 92% Util, 43c Temp, 37% Fan
HWMon.GPU.#2...: 88% Util, 43c Temp, 37% Fan
HWMon.GPU.#3...: 92% Util, 44c Temp, 37% Fan

Started: Mon Sep 8 19:44:59 2014
Stopped: Mon Sep 8 20:37:12 2014


Now that we have the intermediate hash $HEX[f2ab1219ae], we can move forward to our second module:

Second mode, collide intermediate hash

Code:
// kernel hacked at Position step 5, check intermediate Hash:
// switch back to unicode

uintx a = 0;
uintx b = 0;
uintx c = 0;
uintx d = 0;

if ((digest[0] == swap_workaround (0xf2ab1219)) && ((digest[1] & 0xff) == swap_workaround (0xae000000)))
{
  a = search[0];
  b = search[1];
  c = search[2];
  d = search[3];
}

// comment out everything from here

Quote:
root@et:~/oclHashcat-1.31# ./oclHashcat64.bin -m 9700 hash -a 3 ?a?a?a?a?a?a -w 3 --potfile-disable
oclHashcat v1.31 starting...

Device #1: Tahiti, 2967MB, 1000Mhz, 32MCU
Device #2: Tahiti, 2967MB, 1000Mhz, 32MCU
Device #3: Tahiti, 2967MB, 1000Mhz, 32MCU

Hashes: 1 hashes; 1 unique digests, 1 unique salts
Bitmaps: 8 bits, 256 entries, 0x000000ff mask, 1024 bytes
Applicable Optimizers:
* Zero-Byte
* Precompute-Init
* Not-Iterated
* Single-Hash
* Single-Salt
* Brute-Force
Watchdog: Temperature abort trigger set to 90c
Watchdog: Temperature retain trigger set to 80c
Device #1: Kernel ./amd/m9700_a3.cl (32100 bytes)
Device #1: Kernel ./amd/markov_le_v1.cl (10294 bytes)
Device #1: Kernel ./amd/bzero.cl (887 bytes)
Device #2: Kernel ./amd/m9700_a3.cl (32100 bytes)
Device #2: Kernel ./amd/markov_le_v1.cl (10294 bytes)
Device #2: Kernel ./amd/bzero.cl (887 bytes)
Device #3: Kernel ./amd/m9700_a3.cl (32100 bytes)
Device #3: Kernel ./amd/markov_le_v1.cl (10294 bytes)
Device #3: Kernel ./amd/bzero.cl (887 bytes)

$oldoffice$1*d6aabb63363188b9b73a88efb9c9152e*afbbb9254764273f8f4fad9a5d82981f*6f09fd2eafc4ade522b5f2bee0eaf66d:zvDtu!

Session.Name...: oclHashcat
Status.........: Cracked
Input.Mode.....: Mask (?a?a?a?a?a?a) [6]
Hash.Target....: $oldoffice$1*d6aabb63363188b9b73a88efb9c9152e*afbbb9254764273f8f4fad9a5d82981f*6f09fd2eafc4ade522b5f2bee0eaf66d
Hash.Type......: MS Office <= 2003 MD5 + RC4, oldoffice$0, oldoffice$1
Time.Started...: Mon Sep 8 21:31:16 2014 (14 secs)
Speed.GPU.#1...: 392.6 MH/s
Speed.GPU.#2...: 392.6 MH/s
Speed.GPU.#3...: 392.6 MH/s
Speed.GPU.#*...: 1177.8 MH/s
Recovered......: 1/1 (100.00%) Digests, 1/1 (100.00%) Salts
Progress.......: 15860301824/735091890625 (2.16%)
Skipped........: 0/15860301824 (0.00%)
Rejected.......: 0/15860301824 (0.00%)
HWMon.GPU.#1...: 98% Util, 43c Temp, 37% Fan
HWMon.GPU.#2...: 98% Util, 43c Temp, 39% Fan
HWMon.GPU.#3...: 98% Util, 43c Temp, 41% Fan

Started: Mon Sep 8 21:31:16 2014
Stopped: Mon Sep 8 21:31:34 2014

Collision found in 14 seconds :)

Of course, there's no guarantee always to find a collision in 14 seconds, but it will not take to long...



Test

Yes, you can use the collision to open the document. If you want to try, here's the .doc I've been using:

Document: https://hashcat.net/misc/DocOld2010.doc

You can use both passwords:
  • hashcat
  • zvDtu!
I'm going to add the above mentioned two special modes to oclHashcat v1.31. Feedback welcome!

Jens "atom" Steube
#2
[Image: we.png]
#3
atom > all
#4
Awesome ! Thank you !
Does it also work for Excel, PowerPoint 97-2003 password protected document ? Or only Word ?
#5
works for all MS Office 97-2003 files
#6
This is awesome Big Grin u so totally rock dude!
#7
Thanks Guys!

I've finished the special modes 9710 and 9720 for in BF mode for AMD cards.

Beta-Tester can download latest beta version from beta site. Make sure you download beta17 or higher.

To use the collider mode, you first need to crack the RC4 key as described above.

Cracking the RC4 is done by mode 9710. I'd recommend you to use -o as it will save you some copy paste stuff.

Code:
./oclHashcat64.bin -m 9710 hash -a 3 -w 3 ?b?b?b?b?b -o hash.rc4

Indeed it will be possible later on to not BF the RC4 key but use all attack-modes as you know them.

BF is just cool as it's guaranteed to success in a short time.

However I'd recomment yout to run your dictionaries first, with a small ruleset and if it does not hit, redo it with -a 3.

Once you have the RC4 key, the output-line in hash.rc4 looks like this:

Quote:$oldoffice$1*d6aabb63363188b9b73a88efb9c9152e*afbbb9254764273f8f4fad9a5d82981f*6f09fd2eafc4ade522b5f2bee0eaf66d:f2ab1219ae

The RC4 key (in hex) was just apppend to the line.

This line is exactly the format for the input-line for the collider mode 9720. That's why I recommend to use -o because you can use that file as hashfile for -m 9720 now.

Code:
./oclHashcat64.bin -m 9720 hash.rc4 -a 3 -w 3 ?a?a?a?a?a?a

If it exhausts, just add another ?a, it will collide sooner or later, for sure.

Quote:$oldoffice$1*d6aabb63363188b9b73a88efb9c9152e*afbbb9254764273f8f4fad9a5d82981f*6f09fd2eafc4ade522b5f2bee0eaf66d:f2ab1219ae:zvDtu!

Some other notes:
  • Beta17 only supports AMD cards and only in BF mode, other will follow soon
  • You can already multi-collide (as in multi-hash) this mode
  • The collider will work *only* for $0 and $1 of MS Office 97-2003 using MD5/RC4 as described above.
  • To crack $3 or $4 of MS Office 97-2003 using SHA1/RC4 you need to use -m 9800

--
atom
#8
Cool. I don't quite see the gain with the mitm in this case though. If you skipped that and just made a "naive" yet optimized format, would you not get nearly 80Mp/s anyway since the significant work is RC4 anyway? And at first hit of that (within 2h on average), you'd be done. (Edit: you said 65Mp/s so 2:20h on average but I still don't see much point with mitm)

That is unless you want to find more candidates for the same documents though (trying to spot the "real" password). In that case the mitm is a definitive speedup.

Anyway, what really impress me is the RC4 speed you achieve!
#9
You have a sharp eye!

If you just want to crack the document you could stop the process after mode #1 and use the RC4 key to decrypt it. That would require special decrypting tools that know how to decrypt the .doc or .xls or whatever with only the RC4 key. I could imagine such tools exist. But that's another reason why I split the one mode 9700 into two modes 9710 and 9720. If you just want to crack the document and you have such a tool, all you need is 9710.

But, as you said as well, there's often the case that you are not interessted in the document data itself. In that case you're only interessted in the real password that was used. That was actually my intention when I first thought about splitting the kernels, because I wanted to get rid of the RC4 stuff. This is where mitm idea came in. By removing RC4 stuff from the kernel, that would speed up th MD5 kernel. Afterwards, when I did that, I realized that it's not the full 128 bit MD5 digest that we need to hit, it's just 40 bits. This is when the Idea to collide it came in.
#10
Another thing what I've just added, but only for the 9720 kernels, is that those hashes never get marked as "cracked" internally in oclHashcat. That means a matching password will be displayed or stored to outfile as normal, but afterwards oclHashcat can crack it again in case another candidate matches.

This behavior is very unique to this hashmode. If you use this mode, you usually want to know all of the regular password that collide with the document password. I let it run for 20 minutes using the original document linked above. Here's a list of collisions that all do match each other, it's really funny to watch that:
  • hashcat
  • zvDtu!
  • xPWl?:
  • bE>4Eb
  • WURSTSALAT7745
  • rhUrho2020

--
atom