I started gluing those functions together today. (I got my intro to opencl, I have to say it was interesting).
I'm sort of getting the performances I was expecting... But for the AES256 key inversion function, that is making everything else completely useless. I have ~70 M/s on my laptop's Intel 6200 with the AES key inversion function commented out, one tenth of that (9 M/s) with that function enabled. Here is the function:
https://github.com/hashcat/hashcat/blob/...es.cl#L960
It doesn't look like anything special to me, so I don't know if I'm doing something wrong with my code, messing up my memories, etc (I tried setting the ks parameter that gets updated a lot in the key inversion function to __local, but it would't let me compile, I'm assuming the compiler is already setting that to use the private memory anyway). Or maybe the function is just slow and I need to live with it.
Here is my ugly code just for reference: https://pastebin.com/JjhDmpyC
I'm making up my defines for the opencl compiler, it might be the problem and I'm checking it right now, but I don't think so:
Any idea?
Thank you.
Edit: would it make sense if when testing the same code on a 1050 I get good performances with no gigantic bottleneck there? Only 1/3rd of the performances when enabling AES? I'm now seeing 200 M/s on a 1050 (I fixed the bug where the compiler was optimising out some functions because I'm terrible at programming).
I'm sort of getting the performances I was expecting... But for the AES256 key inversion function, that is making everything else completely useless. I have ~70 M/s on my laptop's Intel 6200 with the AES key inversion function commented out, one tenth of that (9 M/s) with that function enabled. Here is the function:
https://github.com/hashcat/hashcat/blob/...es.cl#L960
It doesn't look like anything special to me, so I don't know if I'm doing something wrong with my code, messing up my memories, etc (I tried setting the ks parameter that gets updated a lot in the key inversion function to __local, but it would't let me compile, I'm assuming the compiler is already setting that to use the private memory anyway). Or maybe the function is just slow and I need to live with it.
Here is my ugly code just for reference: https://pastebin.com/JjhDmpyC
I'm making up my defines for the opencl compiler, it might be the problem and I'm checking it right now, but I don't think so:
Code:
-D DGST_ELEM=4 -D DGST_R0=0 -D DGST_R1=1 -D DGST_R2=2 -D DGST_R3=3 -D _unroll -D DEVICE_TYPE=4
Any idea?
Thank you.
Edit: would it make sense if when testing the same code on a 1050 I get good performances with no gigantic bottleneck there? Only 1/3rd of the performances when enabling AES? I'm now seeing 200 M/s on a 1050 (I fixed the bug where the compiler was optimising out some functions because I'm terrible at programming).