Multi-GPU utilization
#1
Hi,

I've been testing some Office13 cracking on 8 GPU rig (RTX 2080Ti). I've run into something weird about the GPU utilization. I've played with --skip and --limit and found out that when the limit is kinda small only some GPUs are used (no surprise here), but the computation still took like 5-10minutes which I do not consider of that short time. 

Could someone, please, explain to me what is going on in the examples below and why only a few GPUs are used or have low utilization? The cracking speed of one card is around 21,5k H/s.

The reason why I'm asking is that I generally have got limited time slots in which I can compute on the nodes and I need to get the most out of the nodes. 

Also, what other hash-modes are probably working the same? Is it something common for the slow hash-modes? 

Thanks

Note: that 10-15% utilized card was used for RDP.

Code:
$ .\hashcat64.exe -m 9600 --optimized-kernel-enable -a 3 <hash> -s 0 -l 10000 ?a?a?a?a?a?a?a?a
Hardware.Mon.#1..: Temp: 43c Fan: 30% Util: 84% Core:1935MHz Mem:6800MHz Bus:1
Hardware.Mon.#2..: Temp: 38c Fan: 27% Util:  0% Core:1350MHz Mem:6800MHz Bus:1
Hardware.Mon.#3..: Temp: 35c Fan: 27% Util:  0% Core:1350MHz Mem:6800MHz Bus:1
Hardware.Mon.#4..: Temp: 35c Fan: 27% Util:  0% Core:1350MHz Mem:6800MHz Bus:1
Hardware.Mon.#5..: Temp: 37c Fan: 27% Util:  0% Core:1350MHz Mem:6800MHz Bus:1
Hardware.Mon.#6..: Temp: 39c Fan: 29% Util:  0% Core:1350MHz Mem:6800MHz Bus:1
Hardware.Mon.#7..: Temp: 34c Fan: 27% Util:  0% Core:1350MHz Mem:6800MHz Bus:1
Hardware.Mon.#8..: Temp: 45c Fan: 32% Util: 13% Core:1890MHz Mem:6800MHz Bus:1

Code:
$ .\hashcat64.exe -m 9600 --optimized-kernel-enable -a 3 <hash> -s 0 -l 80000 ?a?a?a?a?a?a?a?a
Hardware.Mon.#1..: Temp: 59c Fan: 51% Util: 85% Core:1800MHz Mem:6800MHz Bus:1
Hardware.Mon.#2..: Temp: 34c Fan: 27% Util:  0% Core: 300MHz Mem: 405MHz Bus:1
Hardware.Mon.#3..: Temp: 32c Fan: 27% Util:  0% Core: 300MHz Mem: 405MHz Bus:1
Hardware.Mon.#4..: Temp: 33c Fan: 27% Util:  0% Core: 300MHz Mem: 405MHz Bus:1
Hardware.Mon.#5..: Temp: 34c Fan: 27% Util:  0% Core: 300MHz Mem: 405MHz Bus:1
Hardware.Mon.#6..: Temp: 46c Fan: 29% Util: 87% Core:1890MHz Mem:6800MHz Bus:1
Hardware.Mon.#7..: Temp: 32c Fan: 27% Util:  0% Core: 300MHz Mem: 405MHz Bus:1
Hardware.Mon.#8..: Temp: 44c Fan: 31% Util: 10% Core:1890MHz Mem:6800MHz Bus:1

Code:
$  .\hashcat64.exe -m 9600 --optimized-kernel-enable -a 3 <hash> -s 0 -l 160000 ?a?a?a?a?a?a?a?a
Hardware.Mon.#1..: Temp: 50c Fan: 34% Util: 84% Core:1815MHz Mem:6800MHz Bus:1
Hardware.Mon.#2..: Temp: 37c Fan: 27% Util:  0% Core:1350MHz Mem:6800MHz Bus:1
Hardware.Mon.#3..: Temp: 35c Fan: 27% Util:  0% Core:1350MHz Mem:6800MHz Bus:1
Hardware.Mon.#4..: Temp: 42c Fan: 27% Util: 88% Core:1875MHz Mem:6800MHz Bus:1
Hardware.Mon.#5..: Temp: 35c Fan: 27% Util:  0% Core:1350MHz Mem:6800MHz Bus:1
Hardware.Mon.#6..: Temp: 35c Fan: 27% Util:  0% Core:1350MHz Mem:6800MHz Bus:1
Hardware.Mon.#7..: Temp: 50c Fan: 34% Util: 88% Core:1785MHz Mem:6800MHz Bus:1
Hardware.Mon.#8..: Temp: 45c Fan: 30% Util: 15% Core:1890MHz Mem:6800MHz Bus:1

Code:
$ .\hashcat64.exe -m 9600 --optimized-kernel-enable -a 3 <hash> -s 0 -l 1000000 ?a?a?a?a?a?a?a?a
Hardware.Mon.#1..: Temp: 60c Fan: 52% Util: 91% Core:1785MHz Mem:6800MHz Bus:1
Hardware.Mon.#2..: Temp: 61c Fan: 55% Util: 90% Core:1785MHz Mem:6800MHz Bus:1
Hardware.Mon.#3..: Temp: 60c Fan: 55% Util: 92% Core:1785MHz Mem:6800MHz Bus:1
Hardware.Mon.#4..: Temp: 59c Fan: 54% Util: 91% Core:1770MHz Mem:6800MHz Bus:1
Hardware.Mon.#5..: Temp: 60c Fan: 55% Util: 93% Core:1800MHz Mem:6800MHz Bus:1
Hardware.Mon.#6..: Temp: 60c Fan: 55% Util: 91% Core:1830MHz Mem:6800MHz Bus:1
Hardware.Mon.#7..: Temp: 60c Fan: 56% Util: 92% Core:1800MHz Mem:6800MHz Bus:1
Hardware.Mon.#8..: Temp: 60c Fan: 55% Util: 94% Core:1815MHz Mem:6800MHz Bus:1
Reply
#2
The performance of the GPU is not because they are fast but they run in parallel. A too low -l hinders them utilizing all shader. Also, try -w 3 if -l is big enough
Reply
#3
I've got some free time on the node today so I've tested the -w 3 and messed around with --limit. To be honest it left me with more questions than answers. I've expected that with increasing --limit, I'll get better and better utilization on all 8 GPUs. But that was not the case I ran:

$ hashcat -m 9600 --optimized-kernel-enable --machine-readable -a3 '<office13_example_hash>' ?a?a?a?a?a?a?a?a -s 816687 -l 817568 --status-timer=10 --status --machine-readable -w3 

after like 4 minutes since first status shown up I've got 

... TEMP 67 70 68 66 69 65 63 64 ... UTIL 64 66 62 69 75 61 61 58


which I've kinda suspected and I'm generally fine with. But, when I've increased --limit to 2 025 419 and ran it like:

$ hashcat -m 9600 --optimized-kernel-enable --machine-readable -a3 '<office13_example_hash>' ?a?a?a?a?a?a?a?a -s 1634255 -l 2025419 --status-timer=10 --status --machine-readable -w 3

after the same time I've got

... TEMP 43 45 43 42 42 44 40 44 ... UTIL 9 9 32 8 8 9 9 17

which makes no sense to me since. Any thoughts or insight?
Reply
#4
The limit should be fine. Eventually a problem with the PCI-E bus.
Reply
#5
Well, when I run that same attack without the limit I'm getting 98-100% utilization over all of the cards. So I would say PCI-e is fine or is there some difference between limited and non-limited execution?
Reply
#6
The general rule for using -l is using --progress-only first in order to find out the ideal multiple for -l. Also note the use of -S.

Quote:./hashcat -m 9600 hash.txt -O -a 3 -s 0 ?a?a?a?a?a?a?a?a -w 3 --progress-only -S
...
Progress.#1......: 262144
Runtime.#1.......: 57703.29ms

So 100% utilization for 57 seconds. Of course, if you go past that time, it will reduce its workload or exhaust. So make it a multiple, for example -l 2621440 gives me 100% for almost 10 minutes. That is per one GPU. If you have multiple GPU, first add the progress together, then multiply with 10 or so. After 4 minutes you should still have 100% utilization. So it does for me:

Quote:./hashcat -m 9600 hash.txt -O -a 3 -s 0 ?a?a?a?a?a?a?a?a -w 3 -l 2621440 -S
...
Hardware.Mon.#1..: Temp: 80c Fan: 45% Util:100% Core:1164MHz Mem:3004MHz Bus:16
Reply