MD5 and v3.00
#1
Hello,

I'm facing an issue with the new release of hashcat v3.00.

"in theory" MD5 is faster in v3.00 than in v2.01 by +18,63% according to the comparison table (GTX 980)

Quote:On my GTX 980 :
v2.01 bench @ MD5 : 11182 MH/s
v3.00 bench @ MD5 : 13196 MH/s

But "in reality" when I try with 2448 hashes (+ 37534505 words into dic and ~90000 rules) the compute time is almost twice slower :

Quote:Cmd lines are :
(v2.01) cudaHashcat64.exe   -m 0 -w 3 hashes.txt  dic.dic -r 90k.rule
(v3.00) hashcat.exe           -m 0 -w 3 hashes.txt  dic.dic -r 90k.rule

Out :
Code:
cudaHashcat v2.01 starting...

Speed.GPU.#1...:  1867.9 MH/s

Started: Sat Jul 02 11:26:58 2016
Stopped: Sat Jul 02 11:54:55 2016

Duration => 28 min
Code:
hashcat (v3.00-1-g67a8d97) starting...

Speed.Dev.#1...:  1199.8 MH/s (83.60ms)

Started: Sat Jul 02 11:54:56 2016
Stopped: Sat Jul 02 12:38:04 2016

Duration  => 44 min

We notice :
- the speed is slower in v3.00 : 1199 MH/s vs 1867 for the same number of hashes
- the total cracking time : 28 min vs 44 min

Any reason for that ? Is there any option (-w ?) I missed in v3.00 ?

Thank you.
#2
I have found similar results with -m 2611 and -m 2711 (the only ones I tested). With single hashes or when using benchmarks v3.00 far outperforms v2.01 but as soon as you have a sizeable hash list with a straight wordlist it can no longer keep up. It seems to be slower at transferring data to the GPU, because by using rules I was able to get excellent performance again. I will be keeping v2.01 for straight wordlist cracking and v3.00 for rules for now.

This may have something to do with switching over to OpenCL from CUDA for Nvidia, but this goes way over my head so I am just guessing here. But if someone with an AMD card would like to test and see if the same holds true for them, that would be nice. :)

Note: I hoped it had something to do with the new workload profiles so I tried setting -n -u and --opencl-vector-width manually, specifically optimizing it by trying until I found the best result for my list and setup. It still could not keep up although I did get closer: ~740 MH/s vs ~900 MH/s with -m 2711. I have a GTX 970, just for the record.

Edit: I totally missed the part where you already used rules! Try tuning manually, and see if that helps: https://hashcat.net/forum/thread-5184.html
Edit2: I use NVidia driver version 368.39

Update: After more testing it seems that rules alone are not quicker either, it is really bruteforce and hybrid modes that are quicker. I will be sticking with v2.01 for most things, but will definitely perform hybrid and bruteforce attacks with v3.00 because the speed improvement is over 100% in some cases.
#3
So.. pure bruteforcing on 3.00 is faster than v2.01 ; but rules and wordlist attacks is slower.
Weird ?
#4
(07-02-2016, 10:19 PM)Mem5 Wrote: So.. pure bruteforcing on 3.00 is faster than v2.01 ; but rules and wordlist attacks is slower.
Weird ?

I am really just guessing here but I think it has something to do with the fact that bruteforcing can be done with little communication between the GPU and the rest of the system. So the actual algorithms are quicker but the PCI-e interface is slower for some reason. You would think rules would improve the situation as they could be transferred over to video RAM and all the rule modifications would be done on the GPU, but perhaps that is not how it is done?

I really have no idea about how all this works though, this is just my very uneducated guess. Hopefully atom can pop in and enlighten us. :)
#5
I don't think so.
I am pretty sure the wordlist is transferred to GPU before starting the crack.
#6
I guess smaller wordlists could be transferred, so you are probably right there. The big ones are probably transferred in chunks too. This is really odd then!
#7
The reason is that it's not really slower, I just had to fix a number of bugs in the rule engine, which is executed on the GPU, but those fixes lead to a slower kernel. So yes, 2.01 has a faster rule-engine, but a defective one.
#8
(07-04-2016, 01:10 PM)atom Wrote: The reason is that it's not really slower, I just had to fix a number of bugs in the rule engine, which is executed on the GPU, but those fixes lead to a slower kernel. So yes, 2.01 has a faster rule-engine, but a defective one.

Thank you for your response atom, that explains it. If it is just the kernel that is slower I think I will try to port the 2.01 kernels that I use to 3.00 for when I want to just do straight wordlists since there was no defect in that, right?
#9
Straight wordlist with fast hashes like MD5? It's just a matter of seconds isn't it
#10
So, let me summarize a bit :

- pure bruteforcing : v3.00 is faster
- Straight wordlist (no rules) : v3.00 is faster
- wordists + rules : v2.01 is faster BUT buggy

Am I right ?