AES implementation not working on GPUs for small keyspaces?
#1
I'm trying to add an algorithm which involves AES, but it seems that the AES implementation does not work for small keyspaces, running on GPUs.

For this test case, I'm modifying the CL file for hash mode 400 (phpass) and adding AES calls to it from inc_cipher_aes.cl.

When I invoke hashcat with the exact password like so, it says "Exhausted" without finding the password:

Code:
> hashcat64 -m 400 -a 3 --potfile-disable $P$5xxxxUB/Ntxxxjg/ hashcat

Session..........: hashcat
Status...........: Exhausted
Hash.Type........: phpass, WordPress (MD5), phpBB3 (MD5), Joomla (MD5)
   . . .
Guess.Mask.......: hashcat [7]
Guess.Queue......: 1/1 (100.00%)
Speed.Dev.#2.....:        0 H/s (0.14ms)
Recovered........: 0/1 (0.00%) Digests, 0/1 (0.00%) Salts
Progress.........: 1/1 (100.00%)
Rejected.........: 0/1 (0.00%)
Restore.Point....: 1/1 (100.00%)
Candidates.#2....: hashcat -> hashcat
HWMon.Dev.#2.....: Temp: 39c Fan: 20% Util: 64% Core:1100MHz Mem:1500MHz Bus:16

Same thing when I make one of the characters a wildcard, I get the same result:

Code:
Guess.Mask.......: hashca?l [7]
Guess.Queue......: 1/1 (100.00%)
Speed.Dev.#2.....:        0 H/s (0.14ms)
Recovered........: 0/1 (0.00%) Digests, 0/1 (0.00%) Salts
Progress.........: 26/26 (100.00%)
Rejected.........: 0/26 (0.00%)
Restore.Point....: 26/26 (100.00%)
Candidates.#2....: hashcan -> hashcaq

However, if I increase the keyspace to 2 characters (676 combinations), it now works correctly:

Code:
$P$5xxxxUB/Ntxxxjg/:hashcat

Session..........: hashcat
Status...........: Cracked
Hash.Type........: phpass, WordPress (MD5), phpBB3 (MD5), Joomla (MD5)
  . . .
Guess.Mask.......: hashc?l?l [7]
Guess.Queue......: 1/1 (100.00%)
Speed.Dev.#2.....:   223.9 kH/s (0.15ms)
Recovered........: 1/1 (100.00%) Digests, 1/1 (100.00%) Salts
Progress.........: 676/676 (100.00%)
Rejected.........: 0/676 (0.00%)
Restore.Point....: 0/676 (0.00%)
Candidates.#2....: hashcha -> hashcqg
HWMon.Dev.#2.....: Temp: 39c Fan: 20% Util: 49% Core:1100MHz Mem:1500MHz Bus:16

Another way to get it to work is to force hashcat to use CPUs only, using the --opencl-device-type flag like so:

Code:
> hashcat64 -m 400 -a 3 --potfile-disable --opencl-device-type 1 $P$5xxxxUB/Ntxxxjg/ hashcat

. . .

Approaching final keyspace - workload adjusted.

$P$5xxxxUB/Ntxxxjg/:hashcat

Session..........: hashcat
Status...........: Cracked
Hash.Type........: phpass, WordPress (MD5), phpBB3 (MD5), Joomla (MD5)
 . . . .
Guess.Mask.......: hashcat [7]
Guess.Queue......: 1/1 (100.00%)
Speed.Dev.#1.....:        0 H/s (0.03ms)
Recovered........: 1/1 (100.00%) Digests, 1/1 (100.00%) Salts
Progress.........: 1/1 (100.00%)
Rejected.........: 0/1 (0.00%)
Restore.Point....: 0/1 (0.00%)
Candidates.#1....: hashcat -> hashcat
HWMon.Dev.#1.....: N/A

The code modifications is relatively straightforward:

In the m00400.cl file, add #include for the AES cipher, then add a call to aes128_set_encrypt_key, followed by a call to aes128_encrypt. Also the definitions for s_td0..4 and s_te0..4 before the call.

Of course, accordingly, the phpass hash function has to be modified to make this AES call before encoding the 128-bit raw output into the "hash" output.

I used the binary download of 3.6.0 from the main site and just modified the CL file.

I tried it on both Windows 10 and Windows 8.1, using both an AMD and nvidia GPU. This problem is exhibited in both these cases.
#2
That sounds like you did not make sure to fill the shared memory before early exit the kernel (return too soon).
#3
(08-31-2017, 10:09 AM)atom Wrote: That sounds like you did not make sure to fill the shared memory before early exit the kernel (return too soon).

Thanks for the reply.

How would I go about doing that?

The code for populating s_teN and s_tdN variables already has a "barrier" after the for-loop:

Code:
  barrier (CLK_LOCAL_MEM_FENCE);

Is that sufficient enough? Most of the other kernels I've seen does pretty much the same thing, fill the variables and perform AES immediately after.

Sorry I'm a n00b in GPU/OpenCL programming.
#4
Yeah it's the barrier I mean. It's just important that there's no return statement executed before the kernel reaches this point in code. If it does, it's not guaranteed all data from constant buffer is copied to shared memory.

For example see here: https://github.com/hashcat/hashcat/blob/...#L502-L510

This is the typical use of an early return, when no __local memory is used.

On the contrary, see here: https://github.com/hashcat/hashcat/blob/...#L490-L524

Make sure your's is like this, where the first return comes after the barrier.
#5
Thanks! Your clarification with an example of local memory use was particularly helpful. 

My bug was due to the fact that i put the if (gid >= gid_max) return; statement before memory initialization. I assumed that if it was going to return early, why bother setting up the memory at all.