Major difference between benchmark values and real values on RTX4090 rig
#1
Shocked 
Hi, i managed to set up a rig using 8x Rtx4090 on a AsusB250 miningExpert mother board . I'm running hashcat 6.2.6 on windows 10 os and cuda and drivers installed properly(as i can figure out). My main problem is when i run hashcat benchmark i got these values for example for hashmode 0 :

-------------------
* Hash-Mode 0 (MD5)
-------------------

Speed.#1.........:  154.7 GH/s (27.18ms) @ Accel:64 Loops:1024 Thr:512 Vec:1
Speed.#2.........:  155.1 GH/s (26.93ms) @ Accel:64 Loops:1024 Thr:512 Vec:1
Speed.#3.........:  151.7 GH/s (27.31ms) @ Accel:64 Loops:1024 Thr:512 Vec:1
Speed.#4.........:  148.9 GH/s (26.65ms) @ Accel:64 Loops:1024 Thr:512 Vec:1
Speed.#5.........:  147.2 GH/s (26.87ms) @ Accel:64 Loops:1024 Thr:512 Vec:1
Speed.#6.........:  151.0 GH/s (26.68ms) @ Accel:64 Loops:1024 Thr:512 Vec:1
Speed.#7.........:  147.4 GH/s (27.61ms) @ Accel:64 Loops:1024 Thr:512 Vec:1
Speed.#8.........:  149.9 GH/s (27.00ms) @ Accel:64 Loops:1024 Thr:512 Vec:1
Speed.#9.........:  567.2 MH/s (85.99ms) @ Accel:1024 Loops:256 Thr:8 Vec:4
Speed.#*.........:  1206.4 GH/s

#############################################################################################




But when i run hash cat for single hash in -a 3 mode i got these results :




#############################################################################################
Guess.Mask.......: ?1?2?2?2?2?2?2?3?3?3 [10]
Guess.Charset....: -1 ?l?d?u, -2 ?l?d, -3 ?l?d*!$@_, -4 Undefined
Guess.Queue......: 10/15 (66.67%)
Speed.#1.........: 52814.0 MH/s (2.04ms) @ Accel:1024 Loops:32 Thr:32 Vec:1
Speed.#2.........: 47936.9 MH/s (2.03ms) @ Accel:1024 Loops:32 Thr:32 Vec:1
Speed.#3.........: 53635.6 MH/s (2.05ms) @ Accel:1024 Loops:32 Thr:32 Vec:1
Speed.#4.........: 55270.9 MH/s (2.00ms) @ Accel:1024 Loops:32 Thr:32 Vec:1
Speed.#5.........: 55048.9 MH/s (2.01ms) @ Accel:1024 Loops:32 Thr:32 Vec:1
Speed.#6.........: 50955.8 MH/s (2.00ms) @ Accel:1024 Loops:32 Thr:32 Vec:1
Speed.#7.........: 49530.3 MH/s (2.06ms) @ Accel:1024 Loops:32 Thr:32 Vec:1
Speed.#8.........: 51705.7 MH/s (2.02ms) @ Accel:1024 Loops:32 Thr:32 Vec:1
Speed.#9.........:  216.1 MH/s (10.37ms) @ Accel:128 Loops:64 Thr:16 Vec:1
Speed.#*.........:  417.1 GH/s
Recovered........: 0/1 (0.00%) Digests (total), 0/1 (0.00%) Digests (new)
Progress.........: 2389792929087488/9301612953526272 (25.69%)
Rejected.........: 0/2389792929087488 (0.00%)
Restore.Point....: 29400137728/115760814336 (25.40%)
Restore.Sub.#1...: Salt:0 Amplifier:23040-23072 Iteration:0-32
Restore.Sub.#2...: Salt:0 Amplifier:79072-79104 Iteration:0-32
Restore.Sub.#3...: Salt:0 Amplifier:80000-80032 Iteration:0-128
Restore.Sub.#4...: Salt:0 Amplifier:2336-2368 Iteration:0-128
Restore.Sub.#5...: Salt:0 Amplifier:14176-14208 Iteration:0-128
Restore.Sub.#6...: Salt:0 Amplifier:3200-3232 Iteration:0-64
Restore.Sub.#7...: Salt:0 Amplifier:40832-40864 Iteration:0-16
Restore.Sub.#8...: Salt:0 Amplifier:28768-28800 Iteration:0-128
Restore.Sub.#9...: Salt:0 Amplifier:67232-67264 Iteration:0-32
Candidate.Engine.: Device Generator
Candidates.#1....: Jrc3e4xq5t -> pycqxbh$ud
Candidates.#2....: 3a6ofodkh0 -> Ebqbxv5yrr
Candidates.#3....: e4qdvb3p5t -> Fxq646zbnc
Candidates.#4....: Derwjlk1rr -> tilkifvr5t
Candidates.#5....: q28er9b$ud -> jk8m0wze4e
Candidates.#6....: Jday0wze4e -> pwiz8c24y7
Candidates.#7....: Aix5zqln4e -> ac0ns6c!l2
Candidates.#8....: s4sts6c!l2 -> 6vpgc5xq5t
Candidates.#9....: vb8ybhlp6a -> V1vn6l8f6a
Hardware.Mon.#1..: Temp: 54c Fan: 46% Util: 86% Core:2775MHz Mem:10251MHz Bus:1
Hardware.Mon.#2..: Temp: 45c Fan: 43% Util:  0% Core:2790MHz Mem:10251MHz Bus:1
Hardware.Mon.#3..: Temp: 44c Fan: 44% Util:  0% Core:2760MHz Mem:10251MHz Bus:1
Hardware.Mon.#4..: Temp: 44c Fan: 43% Util:  0% Core:2835MHz Mem:10251MHz Bus:1
Hardware.Mon.#5..: Temp: 43c Fan: 43% Util:  0% Core:2820MHz Mem:10251MHz Bus:1
Hardware.Mon.#6..: Temp: 46c Fan: 50% Util:  0% Core:2820MHz Mem:10251MHz Bus:1
Hardware.Mon.#7..: Temp: 44c Fan: 47% Util:  0% Core:2730MHz Mem:10251MHz Bus:1
Hardware.Mon.#8..: Temp: 47c Fan: 51% Util:  0% Core:2790MHz Mem:10251MHz Bus:1
Hardware.Mon.#9..: N/A


#############################################################################################
The problem remains more and less in other hash modes(some hash modes are almost fast as benchmark like 1500 DES  ). 
The only thing that got to my mind is the lack of RAM . i am newbie in setting up rigs for hashcat but i read before that i need RAM>=VRAM but my motherboard does not support such amount of ram to be installed(Rtx4090 has 24GB vram ) .
So what solutions you experts provide ?

And i should mention i got this warning for all gpu when i start hashcat i don't know weather if it's important :

###
Device #4: WARNING! Kernel exec timeout is not disabled.
            This may cause "CL_OUT_OF_RESOURCES" or related errors.
            To disable the timeout, see: https://hashcat.net/q/timeoutpatch
###
Reply
#2
benchmark is your theoretical maxspeed, depending on hash and input you can provide 

for fast hashes like md5 this can be a hard task, especially when having 8 cards, see wiki for more infos

https://hashcat.net/wiki/doku.php?id=fre...full_speed

in your case i would assume providing each card with a uniq mask on their own will result in a better performance as now the passwords candidates are generated and splitted to your cards
OR
regarding your posted infos i would do a test-run with your mask and only 3-4 cards, maybe this way you will get closer to your maximum per card and you have the other 4 free to do another mask or job

and for timeout, yeah please install the patch as mentioned

https://hashcat.net/q/timeoutpatch
Reply
#3
(02-05-2024, 05:10 PM)Snoopy Wrote: benchmark is your theoretical maxspeed, depending on hash and input you can provide 

for fast hashes like md5 this can be a hard task, especially when having 8 cards, see wiki for more infos

https://hashcat.net/wiki/doku.php?id=fre...full_speed

in your case i would assume providing each card with a uniq mask on their own will result in a better performance as now the passwords candidates are generated and splitted to your cards
OR
regarding your posted infos i would do a test-run with your mask and only 3-4 cards, maybe this way you will get closer to your maximum per card and you have the other 4 free to do another mask or job

and for timeout, yeah please install the patch as mentioned

https://hashcat.net/q/timeoutpatch

Thanks for advise . I wonder if i do as you said (using 4 instead of 8 GPUs) is there any sufficient way to cluster 2 or more rigs in any way to do a cracking process simultaneously on them ? or somehow break the work to make it done by 2 rigs ? My attack type is -a 3 at the most of the times . Thanks
Reply
#4
Your motherboard is going to kill a large percentage of your effective speed in any real attack due to the limitation of just x1 PCIe lanes per card. Mining equipment should NOT be used for hashcat, it will cause severe performance issues and potentially reliability issues.
Reply
#5
(02-06-2024, 12:31 PM)Chick3nman Wrote: Your motherboard is going to kill a large percentage of your effective speed in any real attack due to the limitation of just x1 PCIe lanes per card. Mining equipment should NOT be used for hashcat, it will cause severe performance issues and potentially reliability issues.

Thats quite a nice info on mininghardware i wasnt aware of (next to the fact that i missed that when reading the initial post Big Grin ). ty chick3nman
Reply
#6
Wink 
(02-06-2024, 12:31 PM)Chick3nman Wrote: Your motherboard is going to kill a large percentage of your effective speed in any real attack due to the limitation of just x1 PCIe lanes per card. Mining equipment should NOT be used for hashcat, it will cause severe performance issues and potentially reliability issues.

An honor receiving answers from you Smile  . I decided to buy these RTXs on your twitter post about RTX4090 performance . So what motherboard or setup you suggest for 8x Rtx4090 ? and what about that part of my question to clustering some rigs for hashcat ?
Reply
#7
Ideally you want to get something that actually can serve at least x8 3.0 lanes to each card or use multiple rigs. I've been building off AMD Epyc systems recently, 2nd gen and similar, and using SFF-8654 SlimSAS adapters where needed(note: these are NOT like other pcie risers you may have encountered). The adapters are relatively expensive and still a little rare, I've been looking for a cheaper way to get them but I'm still waiting on hardware to arrive so I can test it before I actually suggest using any of it. For now, they remain a bit of a boutique/premium option, much like OcuLink and other PCIe cabling options.

Clustering multiple machines is probably the easiest way to go about this, but it does come with a bit of complexity. You'll want to look into some of the cluster control software like Hashtopolis. It will let you network machines together, at the expense of being a web interface and potentially missing a few more advanced features that using hashcat from the command line might offer.
Reply