Hashcat vs OclHashcat - Netntlm v1
#1
Getting some interesting results for NETNTLMv1 ... It would seem hashcat with dictionary and rules is much faster then oclhashcat (brute or dic+rules) for large number of hashes. Also using hashcat with dictionary and rules is the only scenario where the speed actually increase when adding more (salted) hashes!
Can some one explain why I am seeing these results? Whats the bottleneck?

I understand that some optimization has been done to speed up NETNTLMv1 making use of the last 2 bytes DES chunk. i.e. you first crack the last DES chunk and then only check NTLMs which match that to get the resulting netntlm.

Has this optimization been done for both hashcat and oclHashcat and for all attack modes and could this partially explain these results.

Summary:
---------
hc-st-h5 ?? - 56MH/s
hc-st-h200 ?? - 92MH/s
hc-br-h5 -12MH/s
hc-br-h200 - 0.4MH/s

ocl15-st-h5 - 32MH/s
ocl15-st-h200 - 0.8MH/s
ocl15-br-h5 - 95MH/s
ocl15-br-h200 - 2.3MH/s

ocl14-st-h5 - 51MH/s

where:
-------
hc = hashcat v0.46
ocl15 = oclhashcat v0.15
ocl14 = oclhashcat v0.14

st = straight with rules
br = brute

h5 = 5 hashes
h200 = 200 hashes

?? = unexpected result


System:
CPU: Core i3 530
GPU: NVIDIA GTS250

hashlists:
in5: 5 hashs (salted - NETNTLMv1)
in200: 200 hashs (salted - NETNTLMv1)

hashcat v0.46
--------------
hashcat-cli64.exe -m5500 -a0 -c 1000 -n3 --remove --pw-min=8 -o out in5 ..\Dic\04 -r append4.rule
Speed/sec.: 56.54M plains, 61 words

hashcat-cli64.exe -m5500 -a0 -c 1000 -n3 --remove --pw-min=8 -o out in200 ..\Dic\04 -r append4.rule
Speed/sec.: 92.16M plains, - words

hashcat-cli64.exe -m5500 -a3 -c 1000 -n3 --remove --pw-min=8 -o out in5 ?u?l?l?l?d?d?d?s
Speed/sec.: - plains, 12.23M words

hashcat-cli64.exe -m5500 -a3 -c 1000 -n3 --remove --pw-min=8 -o out in200 ?u?l?l?l?d?d?d?s
Speed/sec.: - plains, 467.49k words

oclhashcat:
-----------
v0.15
cudaHashcat-plus64.exe -m5500 -a3 --gpu-temp-disable --remove -o out in5 ?u?l?l?l?d?d?d?s
Speed.GPU.#1...: 95837.1 kH/s

cudaHashcat-plus64.exe -m5500 -a3 --gpu-temp-disable --remove -o out in200 ?u?l?l?l?d?d?d?s
Speed.GPU.#1...: 2365.0 kH/s

cudaHashcat-plus64.exe -m5500 -a0 --gpu-temp-disable --remove -o out in5 ..\Dic\04 -r append4.rule
Speed.GPU.#1...: 32813.1 kH/s

cudaHashcat-plus64.exe -m5500 -a0 --gpu-temp-disable --remove -o out in200 ..\Dic\04 -r append4.rule
Speed.GPU.#1...: 816.9 kH/s

v0.14 (faster then v0.15 - Because of <15 char password optimization?)
cudaHashcat-plus64.exe -m5500 -a0 --gpu-temp-disable --remove -o out in5 ..\Dic\04 -r append4.rule
Speed.GPU.#1...: 51197.8k/s
#2
No replies yet! Maybe my post was to difficult to read...

So basically my question is: Why am I getting a much higher (95MH/s vs 2.3MH/s) hashes per second with hashcat (dic+rule) compared to oclhashcat (brute)

...when operating on 200 netntlm hashes, dic+rules equivalent to brute force i.e. 8 char passwords with trailing ?d ?s
#3
What were the size of the dic and how many rules. If either is too small, GPU won't have enough work and speed will be lower.
#4
(10-06-2013, 04:54 AM)mastercracker Wrote: What were the size of the dic and how many rules. If either is too small, GPU won't have enough work and speed will be lower.

The dictionary is about 100MB ...thats a lot of lines. But your missing the point here the CPU (dic+rules) is faster then GPU (brute)!
#5
"GPU: NVIDIA GTS250"

There is your problem.
#6
(10-07-2013, 12:01 AM)mailmuncher2000 Wrote: The dictionary is about 100MB ...thats a lot of lines.

that's not a very big dictionary.


(10-07-2013, 02:40 AM)hannhimhe Wrote: "GPU: NVIDIA GTS250"
There is your problem.

come on, everyone knows that gpus are faster than cpus! gpus are magic.
#7
(10-07-2013, 04:11 AM)epixoip Wrote:
(10-07-2013, 12:01 AM)mailmuncher2000 Wrote: The dictionary is about 100MB ...thats a lot of lines.

that's not a very big dictionary.

Lol... opinions Smile... its a 4 character only dictionary not GBs in size like we are accustomed to using. But its was only used to make a point in this case. The point that oclhashcat "brute force" for NETNTLM v1 is slower then hashcat with dic+rules. So this is not a question of saturating the GPU.

Again I am not using a dictionary for the oclhashcat i'm using a mask i.e. brute force.

(10-07-2013, 04:11 AM)epixoip Wrote:
(10-07-2013, 02:40 AM)hannhimhe Wrote: "GPU: NVIDIA GTS250"
There is your problem.

come on, everyone knows that gpus are faster than cpus! gpus are magic.

You guys! Thanks for trashing my system. Yup she pretty old now. But both the CPU and GPU are from the same decade Tongue ... So coming back to topic the comparison is still valid in my eyes. Note: CPU is the old original Core i3 (nobody laugh). My use of these tools is more academic then practical. Besides even today smart rules and dictionaries still crack 50% of all hashes.

But thanks for your opinions...
#8
the fact is, there are quite a few things that will be faster on cpu than on gpu, even if you have a top-of-the-line gpu.
#9
(10-07-2013, 06:17 AM)epixoip Wrote: the fact is, there are quite a few things that will be faster on cpu than on gpu, even if you have a top-of-the-line gpu.

But not for the problem of password cracking? Correct? And the GPU should certainly not be 40x slower as is the case in this example.

Maybe I am naive but I don't believe the capability of the GPU/CPU is the reason for such a huge difference in speed here.

I guess the point I am trying to push here is what I talked about in my first post on optimization of NETNTLM which allows hashcat to get higher speeds on NETNTLM (MD4+DES) then on NTLM (only MD4)

I was wondering if the same optimization has been done for oclhashcat. I was hoping to get the opinion of some one who works on the code. Is that only atom?

I believe due to the vulnerabilities in MSCHAP and PEAP esp. over wifi NETNTLM is an extremely important algo (dare I say more so then NTLM) to optimize and I would be putting in a request for this if oclhashcat has indeed not got the same optimizations.
#10
(10-07-2013, 08:06 AM)mailmuncher2000 Wrote: But not for the problem of password cracking? Correct?

i am talking exclusively about password cracking. there are plenty of cases, when cracking passwords, where cpu very well may be faster.


(10-07-2013, 08:06 AM)mailmuncher2000 Wrote: Maybe I am naive but I don't believe the capability of the GPU/CPU is the reason for such a huge difference in speed here.

you very well may be naive. you haven't really given us a whole lot of well-presented data for us to to figure out what the precise reasons are, but you need to remember that you actually have to give gpus work to do. if you give them a small amount of work, you will not achieve full acceleration. 100MB of words is not a lot of work for fast algorithms, you will not achieve full acceleration. if you provide more work for the right side than the left side, then you will not achieve full acceleration. if you are memory-bound, then you will not achieve full acceleration. i can't say it enough: gpus aren't magic. you have to optimize your attacks in order to use gpus efficiently. and in some cases, gpus are simply the wrong choice.

(10-07-2013, 08:06 AM)mailmuncher2000 Wrote: I was wondering if the same optimization has been done for oclhashcat.

of course it has.

EDIT: oh, one other thing you need to realize as well is that brute force with cpu hashcat is always going to underperform due to the way generated candidates are fed to the worker threads. cpu hashcat is best suited for wordlist-based attacks.

i really think that your biggest problem is misunderstanding your tools... what their strengths/weaknesses are, when to use which, how to maximize their efficiency...