Custom OpenCL kernel question

I'm trying to write a custom kernel to implement algorythm like this:

PHP Code:
$static_salt1 "123jjdsfhjrhfjrhdkedjkewdjdwjdwkjdkewjdkewjdkewjkdjwefjrhfjrhfjrhfjrhfjhrjfhrjsbdnsbdfrfjeh4jh43jhj34hjd3hdjbnbedbwedkjewbdkjedwkejdefre\n\rs";
$static_salt2 "djkedjewkdewjkdjk32bj3432h4o3240324h32432hjsdnbkjdnakdsadhldkhlhd3ljlk3dl4kdj43lkdj43ld3";
$hash md5($static_salt2.md5(md5(md5($pass).$static_salt1))); 

$static_salt1 is more than 64 symbols and contains special symbols like "\r\n", $static_salt2 is more than 64 symbols as well, both are fixed for every hash.

Obviously, because of the big fixed salts I can't use optimized kernel feature, so I'm trying to figure out how to write pure OpenCL kernel.

I'm a total noob in C / hashcat, but previously I've implemented similar algorythm in hashcat-legacy and it works there, but new hashcat has completely different architecture. In hashcat-legacy it was easier because there were multiple examples of hash algorithms using fixed salts, but I can't find any proper examples in new hashcat code base.

Where/How  should I pass both these static salts? 

Do I do this directly in OpenCL kernel?

Should I put both salts definition in include/interface.h ?

Maybe you can point me into some example algorithm which passes multiple static salts as an example.

Sorry, for stupid questions just trying to figure it out.

If both salts are fixed you can hardcode them into the kernel code, saves you some work. See kernel 2610 to get any idea on how to start.
(06-22-2018, 09:52 AM)atom Wrote: If both salts are fixed you can hardcode them into the kernel code, saves you some work. See kernel 2610 to get any idea on how to start.

Interestingly, I can't find any mentions on 2610 algorithm in src/interface.c

Thanks for the tip, looking into it!

EDIT: ok, I think I got it - it's just a general type kernel which is used in multiple algorithms.
Finally, after a day of struggling I was able to implement what I've needed!

It's very tricky to pass multiple salts to a kernel, I had to use concatenated salt in parsing function (src/interface.c) and then pad every single salt with 0 bytes at the end so it aligns properly to 4 bytes when read in the kernel.

Not an easy task to understand everything that is going on in the hashcat.

That was fun though implementing the algorithm!

The speed I have with pure unoptimized kernel is:
Session..........: hashcat
Status...........: Running
Hash.Type........: Custom
Hash.Target......: test
Time.Started.....: Sat Jun 23 14:48:37 2018 (3 mins, 38 secs)
Time.Estimated...: Sat Jun 23 15:36:59 2018 (44 mins, 44 secs)
Guess.Base.......: File (wordlist.01)
Guess.Mod........: Rules (testrules.all.rule)
Guess.Queue......: 2/7 (28.57%)
Speed.Dev.#2.....:   251.6 MH/s (86.18ms) @ Accel:64 Loops:32 Thr:384 Vec:1
Speed.Dev.#3.....:   185.2 MH/s (82.36ms) @ Accel:64 Loops:32 Thr:384 Vec:1
Speed.Dev.#4.....:   258.9 MH/s (82.06ms) @ Accel:64 Loops:32 Thr:384 Vec:1
Speed.Dev.#*.....:   695.8 MH/s
Recovered........: 31/100000 (0.00%) Digests, 0/1 (0.00%) Salts
Recovered/Time...: CUR:2,N/A,N/A AVG:2,132,3178 (Min,Hour,Day)
Progress.........: 151133356032/2019302727279 (7.48%)
Rejected.........: 0/151133356032 (0.00%)
Restore.Point....: 0/9122133 (0.00%)
Candidates.#2....: 2u -> f0shw
Candidates.#3....: 8vf0gHW717 -> ar246801hotel
Candidates.#4....: jh ->
HWMon.Dev.#2.....: Temp: 56c Fan:100% Util:  0% Core:1885MHz Mem:5005MHz Bus:8
HWMon.Dev.#3.....: Temp: 59c Fan:100% Util:  0% Core:1860MHz Mem:4513MHz Bus:16
HWMon.Dev.#4.....: Temp: 54c Fan:100% Util:  0% Core:1847MHz Mem:5005MHz Bus:16

Still better speed than what I was getting with hashcat-legacy!
It seems though I'm not getting full utilization on my GPUs for some reason?

Is it because of the pure kernel?

NOTE: I'm using big wordlist with huge rules, so there is no warning regarding "provide more work, etc".