Custom OpenCL kernel question
#1
Hi.

I'm trying to write a custom kernel to implement algorythm like this:

PHP Code:
$static_salt1 "123jjdsfhjrhfjrhdkedjkewdjdwjdwkjdkewjdkewjdkewjkdjwefjrhfjrhfjrhfjrhfjhrjfhrjsbdnsbdfrfjeh4jh43jhj34hjd3hdjbnbedbwedkjewbdkjedwkejdefre\n\rs";
$static_salt2 "djkedjewkdewjkdjk32bj3432h4o3240324h32432hjsdnbkjdnakdsadhldkhlhd3ljlk3dl4kdj43lkdj43ld3";
$hash md5($static_salt2.md5(md5(md5($pass).$static_salt1))); 

$static_salt1 is more than 64 symbols and contains special symbols like "\r\n", $static_salt2 is more than 64 symbols as well, both are fixed for every hash.

Obviously, because of the big fixed salts I can't use optimized kernel feature, so I'm trying to figure out how to write pure OpenCL kernel.

I'm a total noob in C / hashcat, but previously I've implemented similar algorythm in hashcat-legacy and it works there, but new hashcat has completely different architecture. In hashcat-legacy it was easier because there were multiple examples of hash algorithms using fixed salts, but I can't find any proper examples in new hashcat code base.

Where/How  should I pass both these static salts? 

Do I do this directly in OpenCL kernel?

Should I put both salts definition in include/interface.h ?

Maybe you can point me into some example algorithm which passes multiple static salts as an example.

Sorry, for stupid questions just trying to figure it out.

Thanks.
#2
If both salts are fixed you can hardcode them into the kernel code, saves you some work. See kernel 2610 to get any idea on how to start.
#3
(06-22-2018, 09:52 AM)atom Wrote: If both salts are fixed you can hardcode them into the kernel code, saves you some work. See kernel 2610 to get any idea on how to start.

Interestingly, I can't find any mentions on 2610 algorithm in src/interface.c

Thanks for the tip, looking into it!

EDIT: ok, I think I got it - it's just a general type kernel which is used in multiple algorithms.
#4
Finally, after a day of struggling I was able to implement what I've needed!

It's very tricky to pass multiple salts to a kernel, I had to use concatenated salt in parsing function (src/interface.c) and then pad every single salt with 0 bytes at the end so it aligns properly to 4 bytes when read in the kernel.

Not an easy task to understand everything that is going on in the hashcat.

That was fun though implementing the algorithm!

The speed I have with pure unoptimized kernel is:
Code:
Session..........: hashcat
Status...........: Running
Hash.Type........: Custom
Hash.Target......: test
Time.Started.....: Sat Jun 23 14:48:37 2018 (3 mins, 38 secs)
Time.Estimated...: Sat Jun 23 15:36:59 2018 (44 mins, 44 secs)
Guess.Base.......: File (wordlist.01)
Guess.Mod........: Rules (testrules.all.rule)
Guess.Queue......: 2/7 (28.57%)
Speed.Dev.#2.....:   251.6 MH/s (86.18ms) @ Accel:64 Loops:32 Thr:384 Vec:1
Speed.Dev.#3.....:   185.2 MH/s (82.36ms) @ Accel:64 Loops:32 Thr:384 Vec:1
Speed.Dev.#4.....:   258.9 MH/s (82.06ms) @ Accel:64 Loops:32 Thr:384 Vec:1
Speed.Dev.#*.....:   695.8 MH/s
Recovered........: 31/100000 (0.00%) Digests, 0/1 (0.00%) Salts
Recovered/Time...: CUR:2,N/A,N/A AVG:2,132,3178 (Min,Hour,Day)
Progress.........: 151133356032/2019302727279 (7.48%)
Rejected.........: 0/151133356032 (0.00%)
Restore.Point....: 0/9122133 (0.00%)
Candidates.#2....: 2u -> f0shw
Candidates.#3....: 8vf0gHW717 -> ar246801hotel
Candidates.#4....: jh -> 2urx6kgv.hn
HWMon.Dev.#2.....: Temp: 56c Fan:100% Util:  0% Core:1885MHz Mem:5005MHz Bus:8
HWMon.Dev.#3.....: Temp: 59c Fan:100% Util:  0% Core:1860MHz Mem:4513MHz Bus:16
HWMon.Dev.#4.....: Temp: 54c Fan:100% Util:  0% Core:1847MHz Mem:5005MHz Bus:16

Still better speed than what I was getting with hashcat-legacy!
#5
It seems though I'm not getting full utilization on my GPUs for some reason?

Is it because of the pure kernel?

NOTE: I'm using big wordlist with huge rules, so there is no warning regarding "provide more work, etc".