why Hashcat v6.2.6 is slower than v4.1.0 (or 5.1.0)
#1
Why hashcat v6.2.6 is slower than v4.1.0 with same GPU? Here is my test result.

hashcat v4.1.0 result:

Code:
 
sudo ./hashcat64.bin -m 500 *********** -a 3 -1 ?l?u?d ?1?1?1?1?1?1 -w 3
hashcat (v4.1.0) starting...

OpenCL Platform #1: NVIDIA Corporation
======================================
* Device #1: NVIDIA GeForce RTX 2080, 1995/7982 MB allocatable, 46MCU
* Device #2: NVIDIA GeForce RTX 2080, 1995/7982 MB allocatable, 46MCU
* Device #3: NVIDIA GeForce RTX 2080, 1995/7982 MB allocatable, 46MCU
* Device #4: NVIDIA GeForce RTX 2080, 1995/7982 MB allocatable, 46MCU
* Device #5: NVIDIA GeForce RTX 2080, 1995/7982 MB allocatable, 46MCU
* Device #6: NVIDIA GeForce RTX 2080, 1995/7982 MB allocatable, 46MCU
* Device #7: NVIDIA GeForce RTX 2080, 1995/7982 MB allocatable, 46MCU
* Device #8: NVIDIA GeForce RTX 2080, 1995/7982 MB allocatable, 46MCU

OpenCL Platform #2: Intel(R) Corporation
========================================
* Device #9: Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz, skipped.

Hashes: 1 digests; 1 unique digests, 1 unique salts
Bitmaps: 16 bits, 65536 entries, 0x0000ffff mask, 262144 bytes, 5/13 rotates

Applicable optimizers:
* Zero-Byte
* Single-Hash
* Single-Salt
* Brute-Force

Minimum password length supported by kernel: 0
Maximum password length supported by kernel: 256

ATTENTION! Pure (unoptimized) OpenCL kernels selected.
This enables cracking passwords and salts > length 32 but for the price of drastically reduced performance.
If you want to switch to optimized OpenCL kernels, append -O to your commandline.

Watchdog: Temperature abort trigger set to 90c

[s]tatus [p]ause [b]ypass [c]heckpoint [q]uit => s

Session..........: hashcat
Status...........: Running
Hash.Type........: md5crypt, MD5 (Unix), Cisco-IOS $1$ (MD5)
Hash.Target......: ***********
Time.Started.....: Wed Sep 14 14:30:13 2022 (9 secs)
Time.Estimated...: Wed Sep 14 14:43:17 2022 (12 mins, 55 secs)
Guess.Mask.......: ?1?1?1?1?1?1 [6]
Guess.Charset....: -1 ?l?u?d, -2 Undefined, -3 Undefined, -4 Undefined
Guess.Queue......: 1/1 (100.00%)
Speed.Dev.#1.....:  9123.7 kH/s (77.35ms) @ Accel:1024 Loops:500 Thr:32 Vec:1
Speed.Dev.#2.....:  9200.9 kH/s (76.63ms) @ Accel:1024 Loops:500 Thr:32 Vec:1
Speed.Dev.#3.....:  8775.6 kH/s (80.42ms) @ Accel:1024 Loops:500 Thr:32 Vec:1
Speed.Dev.#4.....:  8776.2 kH/s (80.52ms) @ Accel:1024 Loops:500 Thr:32 Vec:1
Speed.Dev.#5.....:  9176.8 kH/s (76.85ms) @ Accel:1024 Loops:500 Thr:32 Vec:1
Speed.Dev.#6.....:  9090.1 kH/s (77.61ms) @ Accel:1024 Loops:500 Thr:32 Vec:1
Speed.Dev.#7.....:  9137.9 kH/s (77.24ms) @ Accel:1024 Loops:500 Thr:32 Vec:1
Speed.Dev.#8.....:  9101.5 kH/s (77.55ms) @ Accel:1024 Loops:500 Thr:32 Vec:1
Speed.Dev.#*.....: 72355.5 kH/s
Recovered........: 0/1 (0.00%) Digests, 0/1 (0.00%) Salts
Progress.........: 649658368/56800235584 (1.14%)
Rejected.........: 0/649658368 (0.00%)
Restore.Point....: 0/916132832 (0.00%)
Candidates.#1....: Warier -> WMPoba
Candidates.#2....: VbEhAN -> VnWGst
Candidates.#3....: IN8My1 -> IApQda
Candidates.#4....: I9WGst -> IVqUan
Candidates.#5....: VSIYMA -> VkbMon
Candidates.#6....: WOYdce -> W0jX69
Candidates.#7....: V5eYKI -> VZV0er
Candidates.#8....: VX5k45 -> VqMb56
HWMon.Dev.#1.....: Temp: 70c Fan: 38% Util: 99% Core:1815MHz Mem:6800MHz Bus:16
HWMon.Dev.#2.....: Temp: 69c Fan: 38% Util: 99% Core:1845MHz Mem:6800MHz Bus:16
HWMon.Dev.#3.....: Temp: 73c Fan:136% Util: 99% Core:1785MHz Mem:6800MHz Bus:16
HWMon.Dev.#4.....: Temp: 74c Fan:132% Util: 99% Core:1815MHz Mem:6800MHz Bus:16
HWMon.Dev.#5.....: Temp: 69c Fan:133% Util: 99% Core:1845MHz Mem:6800MHz Bus:8
HWMon.Dev.#6.....: Temp: 76c Fan: 43% Util: 99% Core:1830MHz Mem:6800MHz Bus:16
HWMon.Dev.#7.....: Temp: 73c Fan: 41% Util:100% Core:1815MHz Mem:6800MHz Bus:16
HWMon.Dev.#8.....: Temp: 68c Fan: 37% Util: 99% Core:1815MHz Mem:6800MHz Bus:4

hashcat v6.2.6 result:
Code:
sudo ./hashcat.bin -m 500 *********** -a 3 -1 ?l?u?d ?1?1?1?1?1?1 -w 3
hashcat (v6.2.6) starting
OpenCL API (OpenCL 3.0 CUDA 11.7.89) - Platform #1 [NVIDIA Corporation]
=======================================================================
* Device #1: NVIDIA GeForce RTX 2080, 7808/7982 MB (1995 MB allocatable), 46MCU
* Device #2: NVIDIA GeForce RTX 2080, 7808/7982 MB (1995 MB allocatable), 46MCU
* Device #3: NVIDIA GeForce RTX 2080, 7808/7982 MB (1995 MB allocatable), 46MCU
* Device #4: NVIDIA GeForce RTX 2080, 7808/7982 MB (1995 MB allocatable), 46MCU
* Device #5: NVIDIA GeForce RTX 2080, 7808/7982 MB (1995 MB allocatable), 46MCU
* Device #6: NVIDIA GeForce RTX 2080, 7808/7982 MB (1995 MB allocatable), 46MCU
* Device #7: NVIDIA GeForce RTX 2080, 7808/7982 MB (1995 MB allocatable), 46MCU
* Device #8: NVIDIA GeForce RTX 2080, 7808/7982 MB (1995 MB allocatable), 46MCU

OpenCL API (OpenCL 2.1 LINUX) - Platform #2 [Intel(R) Corporation]
==================================================================
* Device #9: Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz, skipped

Minimum password length supported by kernel: 0
Maximum password length supported by kernel: 256

Hashes: 1 digests; 1 unique digests, 1 unique salts
Bitmaps: 16 bits, 65536 entries, 0x0000ffff mask, 262144 bytes, 5/13 rotates

Optimizers applied:
* Zero-Byte
* Single-Hash
* Single-Salt
* Brute-Force

ATTENTION! Pure (unoptimized) backend kernels selected.
Pure kernels can crack longer passwords, but drastically reduce performance.
If you want to switch to optimized kernels, append -O to your commandline.
See the above message to find out about the exact limits.

Watchdog: Temperature abort trigger set to 90c

Host memory required for this attack: 11714 MB

Cracking performance lower than expected?               

* Append -O to the commandline.
  This lowers the maximum supported password/salt length (usually down to 32).

* Append -S to the commandline.
  This has a drastic speed impact but can be better for specific attacks.
  Typical scenarios are a small wordlist but a large ruleset.

* Update your backend API runtime / driver the right way:
  https://hashcat.net/faq/wrongdriver

* Create more work items to make use of your parallelization power:
  https://hashcat.net/faq/morework

[s]tatus [p]ause [b]ypass [c]heckpoint [f]inish [q]uit => s

Session..........: hashcat
Status...........: Running
Hash.Mode........: 500 (md5crypt, MD5 (Unix), Cisco-IOS $1$ (MD5))
Hash.Target......: ***********
Time.Started.....: Wed Sep 14 14:32:15 2022 (56 secs)
Time.Estimated...: Wed Sep 14 15:50:06 2022 (1 hour, 16 mins)
Kernel.Feature...: Pure Kernel
Guess.Mask.......: ?1?1?1?1?1?1 [6]
Guess.Charset....: -1 ?l?u?d, -2 Undefined, -3 Undefined, -4 Undefined
Guess.Queue......: 1/1 (100.00%)
Speed.#1.........:  1526.2 kH/s (30.50ms) @ Accel:32 Loops:1000 Thr:32 Vec:1
Speed.#2.........:  1571.1 kH/s (29.71ms) @ Accel:32 Loops:1000 Thr:32 Vec:1
Speed.#3.........:  1473.4 kH/s (31.85ms) @ Accel:32 Loops:1000 Thr:32 Vec:1
Speed.#4.........:  1465.5 kH/s (32.39ms) @ Accel:32 Loops:1000 Thr:32 Vec:1
Speed.#5.........:  1497.1 kH/s (31.43ms) @ Accel:32 Loops:1000 Thr:32 Vec:1
Speed.#6.........:  1522.2 kH/s (30.75ms) @ Accel:32 Loops:1000 Thr:32 Vec:1
Speed.#7.........:  1536.3 kH/s (30.41ms) @ Accel:32 Loops:1000 Thr:32 Vec:1
Speed.#8.........:  1567.9 kH/s (29.62ms) @ Accel:32 Loops:1000 Thr:32 Vec:1
Speed.#*.........: 12159.7 kH/s
Recovered........: 0/1 (0.00%) Digests (total), 0/1 (0.00%) Digests (new)
Progress.........: 676413440/56800235584 (1.19%)
Rejected.........: 0/676413440 (0.00%)
Restore.Point....: 10362880/916132832 (1.13%)
Restore.Sub.#1...: Salt:0 Amplifier:4-5 Iteration:0-1000
Restore.Sub.#2...: Salt:0 Amplifier:57-58 Iteration:0-1000
Restore.Sub.#3...: Salt:0 Amplifier:4-5 Iteration:0-1000
Restore.Sub.#4...: Salt:0 Amplifier:57-58 Iteration:0-1000
Restore.Sub.#5...: Salt:0 Amplifier:32-33 Iteration:0-1000
Restore.Sub.#6...: Salt:0 Amplifier:0-1 Iteration:0-1000
Restore.Sub.#7...: Salt:0 Amplifier:16-17 Iteration:0-1000
Restore.Sub.#8...: Salt:0 Amplifier:54-55 Iteration:0-1000
Candidate.Engine.: Device Generator
Candidates.#1....: bJCAda -> bUnrST
Candidates.#2....: YGLFy1 -> Yx7oKI
Candidates.#3....: bzfnJA -> byDrST
Candidates.#4....: Y8VOve -> YKKTQU
Candidates.#5....: MpE5GI -> MhXuBO
Candidates.#6....: stMGba -> srWrST
Candidates.#7....: g2IfIE -> gWatTA
Candidates.#8....: WRKPXX -> WcAuBO
Hardware.Mon.#1..: Temp: 69c Fan:129% Util:100% Core:1830MHz Mem:6800MHz Bus:16
Hardware.Mon.#2..: Temp: 83c Fan: 52% Util:100% Core:1710MHz Mem:6800MHz Bus:16
Hardware.Mon.#3..: Temp: 73c Fan:136% Util:100% Core: 907MHz Mem:6800MHz Bus:16
Hardware.Mon.#4..: Temp: 74c Fan:132% Util:100% Core: 922MHz Mem:6800MHz Bus:16
Hardware.Mon.#5..: Temp: 69c Fan:132% Util:100% Core:1860MHz Mem:6800MHz Bus:8
Hardware.Mon.#6..: Temp: 78c Fan:137% Util:100% Core:1815MHz Mem:6800MHz Bus:16
Hardware.Mon.#7..: Temp: 72c Fan:137% Util:100% Core:1860MHz Mem:6800MHz Bus:16
Hardware.Mon.#8..: Temp: 80c Fan:135% Util:100% Core:1815MHz Mem:6800MHz Bus:4

Do not tell me use optimized kernel, and i don't want to use -O parameter, i want to know the reason in pure kernel mode.
Reply
#2
Might be related to changes to Autotune. How is the behavior on different workload profiles / without -w param?

Also, it seems strange that hashcat 6.2.6 doesn't recognize the CUDA backend. Maybe an issue with your driver installation?
Reply
#3
(09-14-2022, 11:14 AM)NoReply Wrote: Might be related to changes to Autotune. How is the behavior on different workload profiles / without -w param?

Also, it seems strange that hashcat 6.2.6 doesn't recognize the CUDA backend. Maybe an issue with your driver installation?

All ways had tried, e.g. withou -w param, use CUDA or OpenCL only, change kerne-accel, kernel-loops, kernel-thread..., hashcat v6.2.6 always slower than v4.1.0.

But other hash mode is normal, like NTLM, MD5, SHA512crypt and so on.

BTW, with -O param, the speed of hashcat v6.2.6 (hash mode 500) is close to v4.1.0.
Reply
#4
Just a hunch, but maybe there is a bug such that it uses the optimized kernel in Versions 4.1.0 and 5.1.0 although it says pure kernel is selected.

You could figure out by feeding it long candidates and check whether you get any rejected candidates.
Reply
#5
(09-14-2022, 07:55 PM)NoReply Wrote: Just a hunch, but maybe there is a bug such that it uses the optimized kernel in Versions 4.1.0 and 5.1.0 although it says pure kernel is selected.

You could figure out by feeding it long candidates and check whether you get any rejected candidates.

I have tried with your hunch, i created a file with some keywords, some length of keywords greater than 15 (because optimized kernal support maximum password length is 15)
Code:
ksdfsdf
asfa
1111111111223456
111111111223456
1111111111223456sdffwefo
sdkfsdf

Launch hashcat v4.1.0 without -O param, no rejected, here is the result
Code:
Recovered........: 0/1 (0.00%) Digests, 0/1 (0.00%) Salts
Progress.........: 6/6 (100.00%)
Rejected.........: 0/6 (0.00%)
Restore.Point....: 0/6 (0.00%)

Launch hashcat v4.1.0 with -O param, rejected 2 keywords
Code:
Recovered........: 0/1 (0.00%) Digests, 0/1 (0.00%) Salts
Progress.........: 6/6 (100.00%)
Rejected.........: 2/6 (33.33%)
Restore.Point....: 0/6 (0.00%)

So, it's other bug.
Reply
#6
Exclamation 
The situation is 1 in 1. At the start it is written 84k / second. And literally after 20 seconds it drops 10 times to a speed of 8 k per second. I use Windows 10 with Opengl installed, NVIDIA 570 driver. 1060-3g Video cards
Reply