Agilebits 1Password support and Design Flaw?
#1
In the last weeks I spend a lot time adding TrueCrypt to oclHashcat-plus since there have been lots of requests for it and I can see how forensic world will benefit from it.

This week I finally finished the first milestone, the hashing part of TrueCrypt. That is PBKDF2-HMAC-Whirlpool, -RipeMD160 and -SHA512. Unfortunately there is another milestone. To finally crack TrueCrypt, you have to decrypt a 448 byte block of data using AES, Serpent or Twofish or some obscure combinations of them. The important word is AES here. To finish support for efficiently cracking TrueCrypt there is no way around adding AES to GPU. Problem is, I never used AES before and it was totally a new world for me.

How to start digging into AES? I thought the best way is to combine the learning phase with something that uses AES and so add some nice new algorithm. At this time, some guy posted a request on hashcat forum asking for support to crack 1Password Agilebits keychain. Guess what, it uses AES Smile

I did not pay much attention to Agilebits 1Password before. I was happy with keepass and never thought about changing. I quickly downloaded the tool and was impressed how easy it worked. It installed without problems, has nice icons, seemed to be very intuitive and it there is also a browser integration so I thought I should finally move from keepass to 1Password.

However, thats the story so far why I started to add 1Password to oclHashcat-plus. So I started to dig into AES and experimented a bit with it and the 1Password keychain. I finally got it working and then ported it to GPU. So what you see here is worlds first 100% GPU implementation of 1Password keychain.

There are other solutions out but they are using CPU to do the AES part.

Quote:
root@sf:~/oclHashcat# ./oclHashcat-plus64.bin -m 6600 testhash2 -a 3 ?l?l?l?l?l?l?l! -u 1000 -n 128
oclHashcat-plus v0.15 by atom starting...

Hashes: 1 total, 1 unique salts, 1 unique digests
Bitmaps: 8 bits, 256 entries, 0x000000ff mask, 1024 bytes
Workload: 1000 loops, 128 accel
Watchdog: Temperature abort trigger set to 90c
Watchdog: Temperature retain trigger set to 80c
Device #1: Cayman, 1024MB, 830Mhz, 24MCU
Device #2: Cayman, 1024MB, 830Mhz, 24MCU
Device #3: Cayman, 1024MB, 830Mhz, 24MCU
Device #4: Cayman, 1024MB, 830Mhz, 24MCU
Device #1: Kernel ./kernels/4098/m6600.Cayman_1084.4_1084.4.kernel (974136 bytes)
Device #2: Kernel ./kernels/4098/m6600.Cayman_1084.4_1084.4.kernel (974136 bytes)
Device #3: Kernel ./kernels/4098/m6600.Cayman_1084.4_1084.4.kernel (974136 bytes)
Device #4: Kernel ./kernels/4098/m6600.Cayman_1084.4_1084.4.kernel (974136 bytes)

testhash2:hashcat!

Session.Name...: oclHashcat-plus
Status.........: Cracked
Input.Mode.....: Mask (?l?l?l?l?l?l?l!)
Hash.Target....: File (testhash2)
Hash.Type......: 1Password Agile Keychain
Time.Started...: Tue Apr 16 15:30:55 2013 (7 mins, 3 secs)
Speed.GPU.#1...: 744.0k/s
Speed.GPU.#2...: 744.0k/s
Speed.GPU.#3...: 743.9k/s
Speed.GPU.#4...: 743.5k/s
Speed.GPU.#*...: 2975.5k/s
Recovered......: 1/1 (100.00%) Digests, 1/1 (100.00%) Salts
Progress.......: 1257111552/8031810176 (15.65%)
Rejected.......: 0/1257111552 (0.00%)
HWMon.GPU.#1...: 99% Util, 59c Temp, 29% Fan
HWMon.GPU.#2...: 99% Util, 64c Temp, N/A Fan
HWMon.GPU.#3...: 99% Util, 58c Temp, 29% Fan
HWMon.GPU.#4...: 99% Util, 55c Temp, N/A Fan

Started: Tue Apr 16 15:30:55 2013
Stopped: Tue Apr 16 15:38:00 2013


As you can see, oclHashcat-plus is running with nearly 3Mhash/s using my two hd6990's which is ~ the speed of two hd7970 (a bit faster).

The reason for the high speed is what I think this might be a design flaw. Here is why:

1Password uses PBKDF2-HMAC-SHA1 to derive a 256 bit key. Actually we are generating a 320 bit key using PBKDF2-HMAC-SHA1 this way but its then truncated to 256 bit. No problem so far, many algorithms like WPA are doing the same thing.

The PBKDF2-HMAC-SHA1 part is what makes the entire calculation slow. For each iteration of PBKDF2-HMAC-SHA1 you call 4 times the SHA1 transform. But this is only to produce a 160 bit key. To produce the required 320 bit key, you call it 8 times. So if you have 1000 iterations, you call it 8000 times. Due to some simple optimizations you can do with HMAC you can precompute ipad and opad, so you end up in 2 + (2 * iterations) = 2002 for 160 bit or 4004 calls to SHA1 transform for 320 bit.

1Password then uses AES in CBC mode to decrypt 1040 byte of data. To be exact, it takes the first 128 bit of the derived key to setup the AES key and takes another 128 bit as an IV for the CBC.

The goal is match the final padding block decrypted 1040 byte of data. If you find the last four 32-bit integers at 0x10101010 the padding is correct and you know your key was correct.

In CBC mode, you take the IV only for the first decryption. You then replace it with the ciphertext of current block (which is then used for the next block).

But if you take a close look now you see these both mechanisms do not match in combination. To find out if the masterkey is correct, all we need is to match the padding, so all we need to satisfy the CBC is the previous 16 byte of data of the 1040 byte block. This 16 byte data is provided in the keychain! In other words, there is no need to calculate the IV at all. Instead of calculating a 256 bit key in the PBKDF2, we just need to calculate 128 bit. Since SHA1 gives us 160 bit, we can save exactly twice the number of calls to sha1 transform. This way I was able the reduce the calls to SHA1 transform from 8000 to 2002 Smile

Stay tuned for oclHashcat-plus v0.15!
#2
Rolf approves this and eagerly waits for a NV version.
#3
AES on GPU so far so good today is a really good day. Thank you for your work, need to install 1Password now to try Big Grin
#4
OMG, amazing! Well done atom
#5
[Disclosure: I work for AgileBits, the makers of 1Password]

This design flaw is certainly real, and is one of the many reasons why we have started migrating to a new design. In short, when the Agile Keychain Format was designed (in 2008), we weren't aware of all of the various problems that come from using unauthenticated CBC mode encryption.

I could plead that we were in reasonably good company in making that kind of error, but as I've since learned, research in academic cryptography had been telling people not to use unauthenticated encryption for more than a decade. This is why today we aren't just looking at the kinds of attacks that seem practical, but we are also paying attention to security theorems.

The new data format which we are tentatively calling the 1Password4 Cloud Keychain Format (until we can come up with a better name) was introduced in December 2012 for 1Password 4 on iOS and it will be rolled out to all platforms in the not so distant future.

We still use CBC in the new format, but padding is random (the length of the pad is stored outside of the ciphertext), and we use an Encrypt-then-MAC construction for authenticated encryption with additional data. Key derivation now involves PBKDF2-HMAC-SHA512.

You can read the full details of the encryption and key derivation in

http://learn.agilebits.com/1Password4/Se...esign.html

Although our source isn't open, we've tried to document this well enough that people can develop their own tools for decrypting the data. Indeed, several have done so.

And thanks for posting your results. We need to be able to advise 1Password users on their selection of Master Passwords, and your crack rates play a central role is devising that advice.

Cheers,

-j

–-
Jeffrey Goldberg
Chief Defender Against the Dark Arts @ AgileBits
http://agilebits.com
#6
Nice work atom Smile
#7
I don't have a lot of experience reading hashcat output, could someone confirm that

Code:
Speed.GPU.#1...: 744.0k/s
Speed.GPU.#2...: 744.0k/s
Speed.GPU.#3...: 743.9k/s
Speed.GPU.#4...: 743.5k/s
Speed.GPU.#*...: 2975.5k/s
means that you are getting roughly 3000 guesses per second using the 4 CPUs?

Also the number of PBKDF2 iterations used by 1Password with the Agile Keychain depends on a number of factors. From the text, it sounds like that test is using only 1000 PBKDF2 iterations. Is that correct?

Thanks. This will help me figure out how best to advise users about choice of password strength.

-j

–-
Jeffrey Goldberg
Chief Defender Against the Dark Arts @ AgileBits
http://agilebits.com
#8
300 000 gueses / s.
#9
(04-16-2013, 05:28 PM)jpgoldberg Wrote: This design flaw is certainly real, [...]

I don't want to quibble about the meaning of the word "flaw", so I'll try to summarize the key points in my own terms to see if I have it right.

First you have speed up The PBKDF2 optimization of decomposing the HMAC calls, so that you don't calculate the SHA1(opad XOR key) and SHA1(ipad XOR key) separately for each round. Instead you calculated those only once.

If that is what you mean by cutting the hash calls in quarter, then it should be noted that this is a general PBKDF2 optimization and nothing specific to 1Password. Also unless you have specific reason to believe that our use of PBKDF2 doesn't make use of that optimization, then this is a bit of a wash. This optimization is available to defenders as well as attackers. (Although, I don't actually know whether we make use of this optimization on all platforms, as it depends on the innards of the crypto libraries we use.)

The second point is that you've found that by checking the padding, you can avoid having to to the full CBC decryption on the data. That is, you can use one AES operation instead of one per each block + IV. I believe that this is a fairly standard optimization. And any system that has a "verify if this decrypted correctly" is going to give attackers a fairly quick test once they have a candidate derived key. Again, I'm not familiar with hashcat or how these things are done, but I wouldn't have assumed that this was all "standard practice."

In the new format that I mentioned, we actually make this last bit easier for you (by design). Because we use an Encrypt-then-MAC construction, you can check a derived hmac key from your candidate password to just check the HMAC tag, without having to perform any AES operations.

There are definitely flaws with using unauthenticated CBC mode as we do. But I'm not sure that the optimizations you point out reflect what I might call a "flaw". Have I misunderstood the nature of what you are doing?

Cheers,

-j

–-
Jeffrey Goldberg
Chief Defender Against the Dark Arts @ AgileBits
http://agilebits.com
#10
as written in atoms main post this means 3m tries per second. It reads "2975.5k/s" -> 2975.5*1000

Nice find atom!