23800 RAR3-p (Compressed) performance problem
#1
Hello! I am having terrible performance of mode 23800 on 10x RTX 4090.
I am using official binaries of hashcat v7.0.0 from hashcat.net, testing on the default hash from the documentation (https://hashcat.net/wiki/doku.php?id=example_hashes). I used mask attack for benchmarking purposes.
I've recorded the tmux session with hashcat, htop and nvidia-smi for demonstration purposes, where you can see the performance issues.

https://asciinema.org/a/O6tpYodIW9JCB2qUyYrTTTiCz - with default workload profile
https://asciinema.org/a/UzjMUUANhl2EdcMBcRC9BYZn0 - with max possible workload profile (-w 4)

Timestamps for -w 4 demo
0:27 - hashcat launched
0:52 - cracking session started (100% GPU util, low CPU util)
1:01 - GPU util drop, CPU util high. I've noticed a lot of hashcat processes in I/O uninterruptible sleep (D)
13:58 - GPU util increasing, CPU util decreasing, hashcat increases "Progress" stat
14:13 - GPU util drop again.

This patten will repeat itself indefinitely

This problem persists on hashcat v7.0.0. As you can see in demos, the speed per gpu is ~2500 H/s which is abhorrent for RTX 4090
Here is a demo for hashcat v6.2.6. I am using max possible workload profile.
https://asciinema.org/a/nHbAixz1OtkPM8DqaebVYLBjL

On this demo hashrate per device is ~150 kH/s, which is expectable.

I would like to provide additional information on this issue on your demand. Hope this gets fixed. Thank you!
Reply
#2
Some additional info:
Hashcat v7.0.0 works correctly on my local RTX 3060 (driver version - 575.64.05, cuda version - 12.9) and produces ~40kH/s with the same command like in -w 4 demo
Reply
#3
I'll look into as much as possible, because I don't have such a powerful machine. Can you reproduce the same behavior also with a single 4090?
Reply
#4
(08-06-2025, 02:06 PM)atom Wrote: I'll look into as much as possible, because I don't have such a powerful machine. Can you reproduce the same behavior also with a single 4090?

Thank you for the response! Using the command 
Code:
./hashcat.bin -d 1 -w 4 -m 23800 -a 3 example23800.hash "?a?a?a?a?a?a"
I get ~26kH/s per RTX 4090, which is similar to the cumulative speed from the second demo.
The low usage GPU, high usage IO uninterruptible CPU periods are now shorter (4 periods in 5-6 minutes)

If you need the full demo in asciinema, I can record it.
Reply
#5
OK, thanks for reporting. I think I was able to identify the issue. Please retry with new beta:

https://hashcat.net/beta/hashcat-7.0.0%2B64.7z

Also, I just saw you were not using -O in the command line, but I assume that was intentional for better bug report clarity.
Reply
#6
(08-06-2025, 04:42 PM)atom Wrote: OK, thanks for reporting. I think I was able to identify the issue. Please retry with new beta:

https://hashcat.net/beta/hashcat-7.0.0%2B64.7z

Also, I just saw you were not using -O in the command line, but I assume that was intentional for better bug report clarity.

Thanks for the fix. New beta solved the root problem.
Your assumption about me not using -O in the command line is correct, although I just tried to run the command
Code:
./hashcat.bin -O -w 4 -m 23800 -a 3 example23800.hash "?a?a?a?a?a?a"

which resulted in this speed

Code:
Session..........: hashcat
Status...........: Running
Hash.Mode........: 23800 (RAR3-p (Compressed))
Hash.Target......: $RAR3$*1*ad56eb40219c9da2*834064ce*32*13*1*eb47b1ab...8e1*33
Time.Started.....: Wed Aug  6 18:12:13 2025 (1 min, 27 secs)
Time.Estimated...: Wed Aug 13 20:33:54 2025 (7 days, 2 hours)
Kernel.Feature...: Optimized Kernel (password length 0-20 bytes)
Guess.Mask.......: ?a?a?a?a?a?a [6]
Guess.Queue......: 1/1 (100.00%)
Speed.#01........:  119.7 kH/s (779.37ms) @ Accel:40 Loops:16384 Thr:384 Vec:1
Speed.#02........:  119.6 kH/s (778.48ms) @ Accel:40 Loops:16384 Thr:384 Vec:1
Speed.#03........:  120.5 kH/s (778.45ms) @ Accel:40 Loops:16384 Thr:384 Vec:1
Speed.#04........:  119.5 kH/s (778.90ms) @ Accel:40 Loops:16384 Thr:384 Vec:1
Speed.#05........:  119.9 kH/s (778.27ms) @ Accel:40 Loops:16384 Thr:384 Vec:1
Speed.#06........:  119.8 kH/s (778.54ms) @ Accel:40 Loops:16384 Thr:384 Vec:1
Speed.#07........:  119.5 kH/s (778.55ms) @ Accel:40 Loops:16384 Thr:384 Vec:1
Speed.#08........:  119.7 kH/s (779.11ms) @ Accel:40 Loops:16384 Thr:384 Vec:1
Speed.#09........:  120.3 kH/s (779.17ms) @ Accel:40 Loops:16384 Thr:384 Vec:1
Speed.#10........:  119.9 kH/s (778.86ms) @ Accel:40 Loops:16384 Thr:384 Vec:1
Speed.#*.........:  1198.6 kH/s
Recovered........: 0/1 (0.00%) Digests (total), 0/1 (0.00%) Digests (new)
Progress.........: 98304000/735091890625 (0.01%)
Rejected.........: 0/98304000 (0.00%)
Restore.Point....: 0/7737809375 (0.00%)
Restore.Sub.#01..: Salt:0 Amplifier:5-6 Iteration:114688-131072
Restore.Sub.#02..: Salt:0 Amplifier:5-6 Iteration:98304-114688
Restore.Sub.#03..: Salt:0 Amplifier:5-6 Iteration:114688-131072
Restore.Sub.#04..: Salt:0 Amplifier:5-6 Iteration:98304-114688
Restore.Sub.#05..: Salt:0 Amplifier:5-6 Iteration:114688-131072
Restore.Sub.#06..: Salt:0 Amplifier:5-6 Iteration:114688-131072
Restore.Sub.#07..: Salt:0 Amplifier:5-6 Iteration:114688-131072
Restore.Sub.#08..: Salt:0 Amplifier:5-6 Iteration:114688-131072
Restore.Sub.#09..: Salt:0 Amplifier:5-6 Iteration:114688-131072
Restore.Sub.#10..: Salt:0 Amplifier:5-6 Iteration:114688-131072
Candidate.Engine.: Device Generator
Candidates.#01...: aarier -> a@[119
Candidates.#02...: aP=ena -> a5%ETA
Candidates.#03...: amZdie -> a<~WTA
Candidates.#04...: aQy%pi -> ak,sus
Candidates.#05...: aOapst -> a)p#77
Candidates.#06...: a%iSUS -> aNL#!!
Candidates.#07...: aM1ZBO -> auQ1ba
Candidates.#08...: ar`pba -> a*Ut23
Candidates.#09...: a_XHHA -> as#CST
Candidates.#10...: a7W*ba -> a#S",1
Hardware.Mon.#01.: Temp: 58c Fan: 67% Util:100% Core:2535MHz Mem:10251MHz Bus:8
Hardware.Mon.#02.: Temp: 59c Fan: 68% Util:100% Core:2595MHz Mem:10251MHz Bus:8
Hardware.Mon.#03.: Temp: 54c Fan: 62% Util:100% Core:2640MHz Mem:10251MHz Bus:8
Hardware.Mon.#04.: Temp: 54c Fan: 63% Util:100% Core:2565MHz Mem:10251MHz Bus:8
Hardware.Mon.#05.: Temp: 64c Fan: 71% Util:100% Core:2625MHz Mem:10251MHz Bus:8
Hardware.Mon.#06.: Temp: 64c Fan: 73% Util:100% Core:2535MHz Mem:10251MHz Bus:8
Hardware.Mon.#07.: Temp: 59c Fan: 68% Util:100% Core:2640MHz Mem:10251MHz Bus:4
Hardware.Mon.#08.: Temp: 61c Fan: 70% Util:100% Core:2505MHz Mem:10251MHz Bus:8
Hardware.Mon.#09.: Temp: 56c Fan: 64% Util:100% Core:2505MHz Mem:10251MHz Bus:8
Hardware.Mon.#10.: Temp: 60c Fan: 69% Util:100% Core:2595MHz Mem:10251MHz Bus:4

The command without -O, on the other hand, results in this

Code:
Session..........: hashcat
Status...........: Running
Hash.Mode........: 23800 (RAR3-p (Compressed))
Hash.Target......: $RAR3$*1*ad56eb40219c9da2*834064ce*32*13*1*eb47b1ab...8e1*33
Time.Started.....: Wed Aug  6 18:15:21 2025 (1 min, 3 secs)
Time.Estimated...: Mon Aug 11 16:25:43 2025 (4 days, 22 hours)
Kernel.Feature...: Pure Kernel (password length 0-128 bytes)
Guess.Mask.......: ?a?a?a?a?a?a [6]
Guess.Queue......: 1/1 (100.00%)
Speed.#01........:  170.0 kH/s (535.58ms) @ Accel:16 Loops:16384 Thr:1024 Vec:1
Speed.#02........:  172.1 kH/s (526.92ms) @ Accel:16 Loops:16384 Thr:1024 Vec:1
Speed.#03........:  180.9 kH/s (521.82ms) @ Accel:16 Loops:16384 Thr:1024 Vec:1
Speed.#04........:  170.1 kH/s (532.10ms) @ Accel:16 Loops:16384 Thr:1024 Vec:1
Speed.#05........:  181.0 kH/s (523.49ms) @ Accel:16 Loops:16384 Thr:1024 Vec:1
Speed.#06........:  170.7 kH/s (534.46ms) @ Accel:16 Loops:16384 Thr:1024 Vec:1
Speed.#07........:  172.4 kH/s (519.63ms) @ Accel:16 Loops:16384 Thr:1024 Vec:1
Speed.#08........:  170.5 kH/s (534.22ms) @ Accel:16 Loops:16384 Thr:1024 Vec:1
Speed.#09........:  170.0 kH/s (538.47ms) @ Accel:16 Loops:16384 Thr:1024 Vec:1
Speed.#10........:  170.1 kH/s (527.42ms) @ Accel:16 Loops:16384 Thr:1024 Vec:1
Speed.#*.........:  1727.9 kH/s
Recovered........: 0/1 (0.00%) Digests (total), 0/1 (0.00%) Digests (new)
Progress.........: 104857600/735091890625 (0.01%)
Rejected.........: 0/104857600 (0.00%)
Restore.Point....: 0/7737809375 (0.00%)
Restore.Sub.#01..: Salt:0 Amplifier:5-6 Iteration:32768-49152
Restore.Sub.#02..: Salt:0 Amplifier:5-6 Iteration:49152-65536
Restore.Sub.#03..: Salt:0 Amplifier:5-6 Iteration:147456-163840
Restore.Sub.#04..: Salt:0 Amplifier:5-6 Iteration:32768-49152
Restore.Sub.#05..: Salt:0 Amplifier:5-6 Iteration:147456-163840
Restore.Sub.#06..: Salt:0 Amplifier:5-6 Iteration:32768-49152
Restore.Sub.#07..: Salt:0 Amplifier:5-6 Iteration:65536-81920
Restore.Sub.#08..: Salt:0 Amplifier:5-6 Iteration:32768-49152
Restore.Sub.#09..: Salt:0 Amplifier:5-6 Iteration:32768-49152
Restore.Sub.#10..: Salt:0 Amplifier:5-6 Iteration:49152-65536
Candidate.Engine.: Device Generator
Candidates.#01...: aarier -> adq,ie
Candidates.#02...: aAb!19 -> aD# $$
Candidates.#03...: a@U $$ -> a[uqta
Candidates.#04...: a^l769 -> a9Cxta
Candidates.#05...: a0T^(1 -> ax?y00
Candidates.#06...: aR 5st -> a/dTZA
Candidates.#07...: aXBzba -> a}MYPI
Candidates.#08...: a X2le -> aw|Don
Candidates.#09...: ad`$30 -> aBPP45
Candidates.#10...: aDrave -> a=>sja
Hardware.Mon.#01.: Temp: 61c Fan: 70% Util:100% Core:2535MHz Mem:10251MHz Bus:8
Hardware.Mon.#02.: Temp: 65c Fan: 72% Util:100% Core:2490MHz Mem:10251MHz Bus:8
Hardware.Mon.#03.: Temp: 59c Fan: 67% Util:100% Core:2535MHz Mem:10251MHz Bus:8
Hardware.Mon.#04.: Temp: 59c Fan: 66% Util:100% Core:2565MHz Mem:10251MHz Bus:8
Hardware.Mon.#05.: Temp: 73c Fan: 79% Util:100% Core:2535MHz Mem:10251MHz Bus:8
Hardware.Mon.#06.: Temp: 71c Fan: 79% Util:100% Core:2535MHz Mem:10251MHz Bus:8
Hardware.Mon.#07.: Temp: 64c Fan: 71% Util:100% Core:2565MHz Mem:10251MHz Bus:4
Hardware.Mon.#08.: Temp: 66c Fan: 75% Util:100% Core:2445MHz Mem:10251MHz Bus:8
Hardware.Mon.#09.: Temp: 59c Fan: 68% Util:100% Core:2535MHz Mem:10251MHz Bus:8
Hardware.Mon.#10.: Temp: 64c Fan: 72% Util:100% Core:2505MHz Mem:10251MHz Bus:4

This is odd.
Reply
#7
Maybe you haven't read in the v7.0.0 release notes, but the use of -w 4 is highly discouraged with new auto tuner. Stick to -w 3 you should get more natural and better results.
Reply
#8
(08-06-2025, 05:28 PM)atom Wrote: Maybe you haven't read in the v7.0.0 release notes, but the use of -w 4 is highly discouraged with new auto tuner. Stick to -w 3 you should get more natural and better results.

Thank you for the clarification. It would be great if user received a message about not using -w 4 (like clang did with -Ofast)
Still, even when using -w 3, Pure kernel slightly outperforms Optimized kernel

Optimized

Code:
Session..........: hashcat
Status...........: Quit
Hash.Mode........: 23800 (RAR3-p (Compressed))
Hash.Target......: $RAR3$*1*ad56eb40219c9da2*834064ce*32*13*1*eb47b1ab...8e1*33
Time.Started.....: Wed Aug  6 18:29:27 2025 (4 mins, 28 secs)
Time.Estimated...: Mon Aug 11 14:32:41 2025 (4 days, 19 hours)
Kernel.Feature...: Optimized Kernel (password length 0-20 bytes)
Guess.Mask.......: ?a?a?a?a?a?a [6]
Guess.Queue......: 1/1 (100.00%)
Speed.#01........:  175.8 kH/s (94.80ms) @ Accel:5 Loops:16384 Thr:512 Vec:1
Speed.#02........:  176.3 kH/s (94.28ms) @ Accel:5 Loops:16384 Thr:512 Vec:1
Speed.#03........:  177.6 kH/s (93.44ms) @ Accel:5 Loops:16384 Thr:512 Vec:1
Speed.#04........:  175.6 kH/s (94.52ms) @ Accel:5 Loops:16384 Thr:512 Vec:1
Speed.#05........:  176.8 kH/s (93.54ms) @ Accel:5 Loops:16384 Thr:512 Vec:1
Speed.#06........:  175.6 kH/s (94.76ms) @ Accel:5 Loops:16384 Thr:512 Vec:1
Speed.#07........:  176.5 kH/s (93.59ms) @ Accel:5 Loops:16384 Thr:512 Vec:1
Speed.#08........:  175.1 kH/s (95.06ms) @ Accel:5 Loops:16384 Thr:512 Vec:1
Speed.#09........:  174.9 kH/s (95.17ms) @ Accel:5 Loops:16384 Thr:512 Vec:1
Speed.#10........:  175.3 kH/s (94.20ms) @ Accel:5 Loops:16384 Thr:512 Vec:1
Speed.#*.........:  1759.5 kH/s
Recovered........: 0/1 (0.00%) Digests (total), 0/1 (0.00%) Digests (new)
Progress.........: 469565440/735091890625 (0.06%)
Rejected.........: 0/469565440 (0.00%)
Restore.Point....: 327680/7737809375 (0.00%)
Restore.Sub.#01..: Salt:0 Amplifier:48-49 Iteration:81920-98304
Restore.Sub.#02..: Salt:0 Amplifier:48-49 Iteration:245760-262144
Restore.Sub.#03..: Salt:0 Amplifier:50-51 Iteration:0-16384
Restore.Sub.#04..: Salt:0 Amplifier:48-49 Iteration:81920-98304
Restore.Sub.#05..: Salt:0 Amplifier:49-50 Iteration:81920-98304
Restore.Sub.#06..: Salt:0 Amplifier:48-49 Iteration:81920-98304
Restore.Sub.#07..: Salt:0 Amplifier:49-50 Iteration:32768-49152
Restore.Sub.#08..: Salt:0 Amplifier:48-49 Iteration:0-16384
Restore.Sub.#09..: Salt:0 Amplifier:47-48 Iteration:245760-262144
Restore.Sub.#10..: Salt:0 Amplifier:48-49 Iteration:49152-65536
Candidate.Engine.: Device Generator
Candidates.#01...: G!o856 -> G`.(le
Candidates.#02...: GmZdie -> Gxc|00
Candidates.#03...: N_cFFA -> N)^ama
Candidates.#04...: G7{pon -> GNi$he
Candidates.#05...: H%R188 -> H5%ETA
Candidates.#06...: G>"EY1 -> Gs\330
Candidates.#07...: HRI`da -> HZe977
Candidates.#08...: GMo,ja -> G<~WTA
Candidates.#09...: Ka?ABO -> KpcSON
Candidates.#10...: GQy%pi -> G pj19
Hardware.Mon.#01.: Temp: 59c Fan: 69% Util:  0% Core:2385MHz Mem:10251MHz Bus:8
Hardware.Mon.#02.: Temp: 54c Fan: 71% Util:100% Core:2775MHz Mem:10251MHz Bus:8
Hardware.Mon.#03.: Temp: 45c Fan: 65% Util:  0% Core:2790MHz Mem:10251MHz Bus:8
Hardware.Mon.#04.: Temp: 57c Fan: 65% Util:  3% Core:2415MHz Mem:10251MHz Bus:8
Hardware.Mon.#05.: Temp: 69c Fan: 77% Util:100% Core:2430MHz Mem:10251MHz Bus:8
Hardware.Mon.#06.: Temp: 64c Fan: 79% Util:100% Core:2475MHz Mem:10251MHz Bus:8
Hardware.Mon.#07.: Temp: 61c Fan: 71% Util:100% Core:2550MHz Mem:10251MHz Bus:4
Hardware.Mon.#08.: Temp: 59c Fan: 75% Util:100% Core:2775MHz Mem:10251MHz Bus:8
Hardware.Mon.#09.: Temp: 52c Fan: 66% Util:100% Core:2715MHz Mem:10251MHz Bus:8
Hardware.Mon.#10.: Temp: 63c Fan: 73% Util: 94% Core:2430MHz Mem:10251MHz Bus:4

Pure

Code:
Session..........: hashcat
Status...........: Running
Hash.Mode........: 23800 (RAR3-p (Compressed))
Hash.Target......: $RAR3$*1*ad56eb40219c9da2*834064ce*32*13*1*eb47b1ab...8e1*33
Time.Started.....: Wed Aug  6 18:34:17 2025 (4 mins, 55 secs)
Time.Estimated...: Mon Aug 11 03:04:25 2025 (4 days, 8 hours)
Kernel.Feature...: Pure Kernel (password length 0-128 bytes)
Guess.Mask.......: ?a?a?a?a?a?a [6]
Guess.Queue......: 1/1 (100.00%)
Speed.#01........:  192.8 kH/s (102.06ms) @ Accel:5 Loops:16384 Thr:616 Vec:1
Speed.#02........:  196.6 kH/s (100.49ms) @ Accel:5 Loops:16384 Thr:616 Vec:1
Speed.#03........:  197.4 kH/s (99.13ms) @ Accel:5 Loops:16384 Thr:616 Vec:1
Speed.#04........:  195.3 kH/s (101.28ms) @ Accel:5 Loops:16384 Thr:616 Vec:1
Speed.#05........:  197.6 kH/s (99.85ms) @ Accel:5 Loops:16384 Thr:616 Vec:1
Speed.#06........:  193.8 kH/s (101.97ms) @ Accel:5 Loops:16384 Thr:616 Vec:1
Speed.#07........:  197.1 kH/s (99.02ms) @ Accel:5 Loops:16384 Thr:616 Vec:1
Speed.#08........:  195.0 kH/s (101.48ms) @ Accel:5 Loops:16384 Thr:616 Vec:1
Speed.#09........:  192.9 kH/s (102.42ms) @ Accel:5 Loops:16384 Thr:616 Vec:1
Speed.#10........:  195.4 kH/s (100.41ms) @ Accel:5 Loops:16384 Thr:616 Vec:1
Speed.#*.........:  1954.0 kH/s
Recovered........: 0/1 (0.00%) Digests (total), 0/1 (0.00%) Digests (new)
Progress.........: 574801920/735091890625 (0.08%)
Rejected.........: 0/574801920 (0.00%)
Restore.Point....: 394240/7737809375 (0.01%)
Restore.Sub.#01..: Salt:0 Amplifier:49-50 Iteration:81920-98304
Restore.Sub.#02..: Salt:0 Amplifier:52-53 Iteration:16384-32768
Restore.Sub.#03..: Salt:0 Amplifier:52-53 Iteration:229376-245760
Restore.Sub.#04..: Salt:0 Amplifier:51-52 Iteration:32768-49152
Restore.Sub.#05..: Salt:0 Amplifier:52-53 Iteration:245760-262144
Restore.Sub.#06..: Salt:0 Amplifier:50-51 Iteration:0-16384
Restore.Sub.#07..: Salt:0 Amplifier:52-53 Iteration:147456-163840
Restore.Sub.#08..: Salt:0 Amplifier:50-51 Iteration:245760-262144
Restore.Sub.#09..: Salt:0 Amplifier:49-50 Iteration:98304-114688
Restore.Sub.#10..: Salt:0 Amplifier:51-52 Iteration:65536-81920
Candidate.Engine.: Device Generator
Candidates.#01...: Ha eba -> H)>.ta
Candidates.#02...: I_"|23 -> IYk;pi
Candidates.#03...: IQ0RY1 -> I*={ra
Candidates.#04...: FR3:90 -> Fk?rra
Candidates.#05...: I>Ykst -> I=(130
Candidates.#06...: N7@X00 -> N1fw30
Candidates.#07...: I[7556 -> I<Htta
Candidates.#08...: NO'RLE -> Ns+"an
Candidates.#09...: H8wfgi -> H |W30
Candidates.#10...: FTcKON -> FxhZ19
Hardware.Mon.#01.: Temp: 64c Fan: 71% Util:  0% Core:2550MHz Mem:10251MHz Bus:8
Hardware.Mon.#02.: Temp: 56c Fan: 74% Util: 10% Core:2790MHz Mem:10251MHz Bus:8
Hardware.Mon.#03.: Temp: 58c Fan: 67% Util:100% Core:2535MHz Mem:10251MHz Bus:8
Hardware.Mon.#04.: Temp: 45c Fan: 68% Util:100% Core:2730MHz Mem:10251MHz Bus:8
Hardware.Mon.#05.: Temp: 72c Fan: 81% Util:100% Core:2565MHz Mem:10251MHz Bus:8
Hardware.Mon.#06.: Temp: 58c Fan: 82% Util:  0% Core:2745MHz Mem:10251MHz Bus:8
Hardware.Mon.#07.: Temp: 65c Fan: 73% Util:100% Core:2625MHz Mem:10251MHz Bus:4
Hardware.Mon.#08.: Temp: 56c Fan: 78% Util:100% Core:2775MHz Mem:10251MHz Bus:8
Hardware.Mon.#09.: Temp: 61c Fan: 69% Util:100% Core:2535MHz Mem:10251MHz Bus:8
Hardware.Mon.#10.: Temp: 66c Fan: 75% Util:100% Core:2595MHz Mem:10251MHz Bus:4
Reply
#9
That is so cool, I haven't even seen that until now. That's because we also improved the crypto library primitives a few weeks ago. And now they become so fast that they are better than then register heavy optimized kernels. Of course that's only for this particulat mode, because it allocates huge register range for every thread. I need to double check that on other GPUs as well, and then simply drop the optimized kernel (and hashcat will fall back to pure kernel).

In regard to -w 4, we will print a warning or so.
Reply
#10
Another idea, since you have so many GPUs in that one box, you will end up with tons of threads running on CPU. With a single GPU it's controllable, but with 10 you might want to play with --hook-threads parameter. start with 1 and increase in small steps, maybe it becomes beneficial at some point
Reply