06-16-2015, 11:34 AM
I've the feeling I need to clarify the following:
You are clearly missing -w 3
and
Things are alot different. For a raw estimation it's ok, but from this numbers it looks like oclHashcat would be inefficient and loose like 10 times the speed just by pbkdf2 operations between the sha256 transformation calls. That's not the case!
For a very bad reference, this is the 290x speed cracking sha256 with oclHashcat:
But to reach such a high guessing speed, oclHashcat uses alot of optimizations. The thing is that only some of them can be used within pbkdf2-hmac-sha256, but the most important ones can not.
When oclHashcat starts up you get a list of optimization applied to raw sha256:
The most important ones are:
For details what they do, please refer to: https://hashcat.net/events/p13/js-ocohaaaa.pdf
But none of them can be used for pbkdf2. Well Zero-Byte can partially, not not as efficient as in raw sha256. This is mostly because of the HMAC, which fills a first buffer with the size of the blocksize of the hash. This is not a fixed data and it depends on the password, so we can not do Precompute-Init and because this we can also not do Precompute-Merkle-Demgard. We could do Early-Skip but in comparison to raw sha256 where you save a bit over 3 steps of 64. it would be only for the last of the 200000 iterations so its nearly nothing.
We can do a special attack to remove the effect of the zero byte optimizations like this:
Now, the speed drops from 1550MH to the following number:
That's still without the early exits which is worth 3 of 64 steps. The Precompute-Init also saves roughly 2 of 64 steps on the start, so we have to reduce the 1466 * (59/64) which is ~ 1351 MH/s. Now with this value (which still includes Precompute-Merkle-Demgard optimization) we can do the division of 200000. That would be a final speed of 6750 H/s.
Now, if take a look at what oclHashcat makes:
In other words, it's really close to the maximum performance. Also note the candidates already includes markov-chain based optimizations so its not just AAA, AAB, etc. this also takes a bit of time.
PS: Actually for pbkdf2-hmac we have 4 calls to the hashing function per iteration but we can precompute the first block (ipad) so that we end up with 2 * number of iterations.
Quote:Speed.GPU.#1...: 1980 H/s
You are clearly missing -w 3
and
Quote:for SHA256 it's 16904 MH/s so 16,904,000,000 / 8 GPUs / 2 HMAC / 100,000 rounds = 10,565 H/s
Things are alot different. For a raw estimation it's ok, but from this numbers it looks like oclHashcat would be inefficient and loose like 10 times the speed just by pbkdf2 operations between the sha256 transformation calls. That's not the case!
For a very bad reference, this is the 290x speed cracking sha256 with oclHashcat:
Quote:Speed.GPU.#1...: 1550.6 MH/s
But to reach such a high guessing speed, oclHashcat uses alot of optimizations. The thing is that only some of them can be used within pbkdf2-hmac-sha256, but the most important ones can not.
When oclHashcat starts up you get a list of optimization applied to raw sha256:
Quote:* Zero-Byte
* Precompute-Init
* Precompute-Merkle-Demgard
* Early-Skip
* Not-Salted
* Not-Iterated
* Single-Hash
* Single-Salt
* Brute-Force
* Scalar-Mode
* Raw-Hash
The most important ones are:
Quote:* Zero-Byte
* Precompute-Init
* Precompute-Merkle-Demgard
* Early-Skip
For details what they do, please refer to: https://hashcat.net/events/p13/js-ocohaaaa.pdf
But none of them can be used for pbkdf2. Well Zero-Byte can partially, not not as efficient as in raw sha256. This is mostly because of the HMAC, which fills a first buffer with the size of the blocksize of the hash. This is not a fixed data and it depends on the password, so we can not do Precompute-Init and because this we can also not do Precompute-Merkle-Demgard. We could do Early-Skip but in comparison to raw sha256 where you save a bit over 3 steps of 64. it would be only for the last of the 200000 iterations so its nearly nothing.
We can do a special attack to remove the effect of the zero byte optimizations like this:
Quote:/oclHashcat64.bin -m 1400 hash.sha256 -a 3 -w 3 ?b?b?b?b?b?b?baaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa -d 1
Now, the speed drops from 1550MH to the following number:
Quote:Speed.GPU.#1...: 1466.0 MH/s
That's still without the early exits which is worth 3 of 64 steps. The Precompute-Init also saves roughly 2 of 64 steps on the start, so we have to reduce the 1466 * (59/64) which is ~ 1351 MH/s. Now with this value (which still includes Precompute-Merkle-Demgard optimization) we can do the division of 200000. That would be a final speed of 6750 H/s.
Now, if take a look at what oclHashcat makes:
Quote:root@et:~/oclHashcat-1.37# ./oclHashcat64.bin -m 10900 hash -w 3 -a 3 ?b?b?b?b?b?b -d 1
...
Speed.GPU.#1...: 6556 H/s
In other words, it's really close to the maximum performance. Also note the candidates already includes markov-chain based optimizations so its not just AAA, AAB, etc. this also takes a bit of time.
PS: Actually for pbkdf2-hmac we have 4 calls to the hashing function per iteration but we can precompute the first block (ipad) so that we end up with 2 * number of iterations.