Benchmark Variance
#1
Hello, there seems to be a 1600 MH/s disparity between my benchmark numbers and the actual cracking numbers. I am thinking it may be due to a difference between the "Accel", "Loops" and through settings.


[Benchmark]
Hashmode: 5600 - NetNTLMv2

Speed.#1.........:  2957.0 MH/s (64.98ms) @ Accel:256 Loops:64 Thr:256 Vec:1
Speed.#2.........:  889.8 MH/s (94.06ms) @ Accel:256 Loops:128 Thr:256 Vec:1
Speed.#*.........:  3846.7 MH/s

[The Test]
sudo hashcat -a3 -w4 -O -m 5600 ~/Desktop/backuplinux/scott.txt ?b?b?b?b?b?b

OpenCL Platform #1: NVIDIA Corporation
======================================
* Device #1: GeForce RTX 2080, 1995/7982 MB allocatable, 46MCU
* Device #2: GeForce GTX 1060 6GB, 1518/6075 MB allocatable, 10MCU

Hashes: 1 digests; 1 unique digests, 1 unique salts
Bitmaps: 16 bits, 65536 entries, 0x0000ffff mask, 262144 bytes, 5/13 rotates

Applicable optimizers:
* Optimized-Kernel
* Zero-Byte
* Not-Iterated
* Single-Hash
* Single-Salt
* Brute-Force

Minimum password length supported by kernel: 0
Maximum password length supported by kernel: 27

Session..........: hashcat
Status...........: Running
Hash.Type........: NetNTLMv2
Hash.Target......: ADMINISTRATOR:TongueROF:637c7285050ac6a3:5b7d72e6da1309...000000
Time.Started.....: Mon Nov 11 01:31:38 2019 (29 secs)
Time.Estimated...: Tue Nov 12 11:44:41 2019 (1 day, 10 hours)
Guess.Mask.......: ?b?b?b?b?b?b [6]
Guess.Queue......: 1/1 (100.00%)
Speed.#1.........:  1753.2 MH/s (439.82ms) @ Accel:256 Loops:256 Thr:256 Vec:1
Speed.#2.........:  531.8 MH/s (314.99ms) @ Accel:512 Loops:128 Thr:256 Vec:1
Speed.#*.........:  2284.9 MH/s
Recovered........: 0/1 (0.00%) Digests, 0/1 (0.00%) Salts
Progress.........: 64491618304/281474976710656 (0.02%)
Rejected.........: 0/64491618304 (0.00%)
Restore.Point....: 0/4294967296 (0.00%)
Restore.Sub.#1...: Salt:0 Amplifier:16384-16640 Iteration:0-256
Restore.Sub.#2...: Salt:0 Amplifier:11520-11648 Iteration:0-128
Candidates.#1....: $HEX[734065000000] -> $HEX[ff40ffff2d00]
Candidates.#2....: $HEX[732d65002e00] -> $HEX[c02dffff4100]
Hardware.Mon.#1..: Temp: 53c Fan: 30% Util:100% Core:1980MHz Mem:6800MHz Bus:4
Hardware.Mon.#2..: Temp: 48c Fan:  0% Util:100% Core:1936MHz Mem:3802MHz Bus:8


The Benchmark uses 254,64,256 while the standard work profile uses 256,256,256 for the rtx 2080. The settings also vary for the 1060.
Any help on figuring this out would be greatly appreciated.
Reply
#2
For me it has kind of the different effect, -w 4 is even slightly faster than the benchmark with -b , because benchmark doesn't use -w 4 by default.

If you want to troubleshoot here are the relevant options that hashcat uses in benchmark mode:
https://github.com/hashcat/hashcat/blob/...1600-L1628
https://github.com/hashcat/hashcat/blob/...#L446-L451

I would also recommend to test with the beta version from here (https://hashcat.net/beta/) also because the code above is the current one that the beta uses (such that we have a reference).

The -w 4 (your run above0 vs -w 3 (the default in benchmark) is what makes the difference in the loops (-u) above. but again, normally -w 4 should be faster than -w 3 so it must be something else.

Did you also try just benchmarking with 1 device ? -d 1 -b -m 5600 and -d 2 -b -m 5600 ?
The Fan speed says 0% which is also suspicous. Any reason why there is NO fan ?

btw: the initial discussion was here https://hashcat.net/forum/thread-8768.html , not sure if we really needed a new forum thread for this

in theory you could also test newest beta with CUDA (install the CUDA SDK) just to eliminate some possible OpenCL related "problems"... you also didn't really mention which driver version you are using, which might also be relevant here (or maybe the problem lies somewhere completely else, also possible)
Reply
#3
(11-12-2019, 09:00 AM)philsmd Wrote: For me it has kind of the different effect, -w 4 is even slightly faster than the benchmark with -b , because benchmark doesn't use -w 4 by default.

If you want to troubleshoot here are the relevant options that hashcat uses in benchmark mode:
https://github.com/hashcat/hashcat/blob/...1600-L1628
https://github.com/hashcat/hashcat/blob/...#L446-L451

I would also recommend to test with the beta version from here (https://hashcat.net/beta/) also  because the code above is the current one that the beta uses (such that we have a reference).

The -w 4 (your run above0 vs -w 3 (the default in benchmark) is what makes the difference in the loops (-u) above. but again, normally -w 4 should be faster than -w 3 so it must be something else.

Did you also try just benchmarking with 1 device ? -d 1 -b -m 5600 and -d 2 -b -m 5600 ?
The Fan speed says 0% which is also suspicous. Any reason why there is NO fan ?

btw: the initial discussion was here https://hashcat.net/forum/thread-8768.html , not sure if we really needed a new forum thread for this

in theory you could also test newest beta with CUDA (install the CUDA SDK) just to eliminate some possible OpenCL related "problems"... you also didn't really mention which driver version you are using, which might also be relevant here (or maybe the problem lies somewhere completely else, also possible)

I have installed Win10 and loaded the most current nvidia drivers (440.12) and the cuda toolkit (10.2). But the results are still the same as when I run it on Linux. The fan comes on when the GPU hits 60c, I am running a giant house fan on an open case, and it takes a minute or two for the card to warm up. 


Benchmark Device1 (rtx2080)
hashcat.exe -d 1 -b -m 5600
Speed.#1.........:  2890.6 MH/s (65.77ms) @ Accel:16 Loops:256 Thr:1024 Vec:1

Benchmark Device2 (gtx1060)

hashcat.exe -d 2 -b -m 5600
Speed.#2.........:   953.3 MH/s (87.23ms) @ Accel:32 Loops:256 Thr:1024 Vec:1

The actual hash

C:\hashcat-5.1.0>hashcat.exe -a 3 -w 4 -m 5600 -O scott.txt ?b?b?b?b?b?b?b
hashcat (v5.1.0-1425-g6adc217b) starting...
* Device #1: WARNING! Kernel exec timeout is not disabled.
             This may cause "CL_OUT_OF_RESOURCES" or related errors.
             To disable the timeout, see: https://hashcat.net/q/timeoutpatch
* Device #2: WARNING! Kernel exec timeout is not disabled.
             This may cause "CL_OUT_OF_RESOURCES" or related errors.
             To disable the timeout, see: https://hashcat.net/q/timeoutpatch
* Device #3: WARNING! Kernel exec timeout is not disabled.
             This may cause "CL_OUT_OF_RESOURCES" or related errors.
             To disable the timeout, see: https://hashcat.net/q/timeoutpatch
* Device #4: WARNING! Kernel exec timeout is not disabled.
             This may cause "CL_OUT_OF_RESOURCES" or related errors.
             To disable the timeout, see: https://hashcat.net/q/timeoutpatch
CUDA API (CUDA 10.2)
====================
* Device #1: GeForce RTX 2080, 8192 MB, 46MCU
* Device #2: GeForce GTX 1060 6GB, 6144 MB, 10MCU
OpenCL API (OpenCL 1.2 CUDA 10.2.95) - Platform #1 [NVIDIA Corporation]
=======================================================================
* Device #3: GeForce RTX 2080, skipped
* Device #4: GeForce GTX 1060 6GB, skipped
Hashes: 1 digests; 1 unique digests, 1 unique salts
Bitmaps: 16 bits, 65536 entries, 0x0000ffff mask, 262144 bytes, 5/13 rotates
Applicable optimizers:
* Optimized-Kernel
* Zero-Byte
* Not-Iterated
* Single-Hash
* Single-Salt
* Brute-Force
Minimum password length supported by kernel: 0
Maximum password length supported by kernel: 27
Watchdog: Temperature abort trigger set to 90c
Host memory required for this attack: 2332 MB

Session..........: hashcat
Status...........: Quit
Hash.Name........: NetNTLMv2
Hash.Target......: ADMINISTRATOR:TongueROF:637c7285050ac6a3:5b7d72e6da1309...000000
Time.Started.....: Tue Nov 12 13:56:10 2019 (33 secs)
Time.Estimated...: Sun Nov 15 09:58:34 2020 (1 year, 3 days)
Guess.Mask.......: ?b?b?b?b?b?b?b [7]
Guess.Queue......: 1/1 (100.00%)
Speed.#1.........:  1702.2 MH/s (450.73ms) @ Accel:64 Loops:256 Thr:1024 Vec:1
Speed.#2.........:   559.0 MH/s (299.01ms) @ Accel:64 Loops:256 Thr:1024 Vec:1
Speed.#*.........:  2261.2 MH/s
Recovered........: 0/1 (0.00%) Digests
Progress.........: 73853304832/72057594037927936 (0.00%)
Rejected.........: 0/73853304832 (0.00%)
Restore.Point....: 0/1099511627776 (0.00%)
Restore.Sub.#1...: Salt:0 Amplifier:18432-18688 Iteration:0-256
Restore.Sub.#2...: Salt:0 Amplifier:27904-28160 Iteration:0-256
Candidates.#1....: $HEX[73486500000000] -> $HEX[ff48ffff2d0000]
Candidates.#2....: $HEX[736d65002e0000] -> $HEX[ff6dffff370000]
Hardware.Mon.#1..: Temp: 50c Fan: 30% Util:100% Core:1980MHz Mem:6800MHz Bus:4
Hardware.Mon.#2..: Temp: 63c Fan: 28% Util:100% Core:1898MHz Mem:3802MHz Bus:8

This has me really scratching my head. I am using the beta hash cat package as well.
Reply
#4
did you actually try to mess around with the command line options and see if you can observe any changes as suggested above?

btw: it would also make sense to read the warnings and act accordingly (apply the timeout patch), even though in this case it doesn't seem related
Reply
#5
The problem was my hash size. I was using the whole hash that I captured, but as soon as I cut it down to the same length as the example, I achieved the same speed as the bench mark.

The hash in the file.

'Administrator:TongueROF:637c7285050ac6a

The proper format
'Administrator:TongueROF:637c7285050ac6a3:5B7D72E6DA1309456B173B05D6F4D93E:0101000000000000C0653150DE09D2019D1E15B28258F0C9000000000200080053004D'

**edit
It appears the hash used in the benchmark is a lot shorter than most of the NTLMV2 hashes Ive seen
Is it possible that the benchmark is using a ntlm hash instead of a ntlmv2?
Reply