Massive slow performance from 900 GH/s to 37 MH/s - smozy92 - 09-28-2021
Hello Evyerone, hope you are doing well
I am getting a massive slow performance on my crackstation with hashcat.
Indeed, I am running a crackstation with 8x Tesla Nvidia A100 GPU.
Here is what I get when running a benchmark for NTLM hashes
Code: sudo time hashcat -a0 -m 1000 hashes/a.txt wordlists/finalweak.txt -O --force -w 4
Code: Hashmode: 1000 - NTLM
Speed.#1.........: 116.1 GH/s (7.67ms) @ Accel:32 Loops:1024 Thr:256 Vec:1
Speed.#2.........: 116.0 GH/s (7.67ms) @ Accel:32 Loops:1024 Thr:256 Vec:1
Speed.#3.........: 116.0 GH/s (7.67ms) @ Accel:32 Loops:1024 Thr:256 Vec:1
Speed.#4.........: 116.0 GH/s (7.67ms) @ Accel:32 Loops:1024 Thr:256 Vec:1
Speed.#5.........: 116.0 GH/s (7.67ms) @ Accel:32 Loops:1024 Thr:256 Vec:1
Speed.#6.........: 116.0 GH/s (7.67ms) @ Accel:32 Loops:1024 Thr:256 Vec:1
Speed.#7.........: 116.0 GH/s (7.67ms) @ Accel:32 Loops:1024 Thr:256 Vec:1
Speed.#8.........: 116.0 GH/s (7.67ms) @ Accel:32 Loops:1024 Thr:256 Vec:1
Speed.#*.........: 928.2 GH/s
However, when trying to crack a single NTLM hash I don't get this power, I only get about 37230.6 kH/s. Since the benchmark said 928 GH/s it's a bit weird to only get 37230.6 kH/s
Here is the output I got when running this command :
Code: hashcat (v4.0.1) starting...
nvmlDeviceGetFanSpeed(): Not Supported
nvmlDeviceGetFanSpeed(): Not Supported
nvmlDeviceGetFanSpeed(): Not Supported
nvmlDeviceGetFanSpeed(): Not Supported
nvmlDeviceGetFanSpeed(): Not Supported
nvmlDeviceGetFanSpeed(): Not Supported
nvmlDeviceGetFanSpeed(): Not Supported
nvmlDeviceGetFanSpeed(): Not Supported
OpenCL Platform #1: NVIDIA Corporation
======================================
* Device #1: A100-SXM4-40GB, 10134/40537 MB allocatable, 108MCU
* Device #2: A100-SXM4-40GB, 10134/40537 MB allocatable, 108MCU
* Device #3: A100-SXM4-40GB, 10134/40537 MB allocatable, 108MCU
* Device #4: A100-SXM4-40GB, 10134/40537 MB allocatable, 108MCU
* Device #5: A100-SXM4-40GB, 10134/40537 MB allocatable, 108MCU
* Device #6: A100-SXM4-40GB, 10134/40537 MB allocatable, 108MCU
* Device #7: A100-SXM4-40GB, 10134/40537 MB allocatable, 108MCU
* Device #8: A100-SXM4-40GB, 10134/40537 MB allocatable, 108MCU
OpenCL Platform #2: The pocl project
====================================
* Device #9: pthread-Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz, skipped.
Hashes: 1 digests; 1 unique digests, 1 unique salts
Bitmaps: 16 bits, 65536 entries, 0x0000ffff mask, 262144 bytes, 5/13 rotates
Rules: 1
Applicable optimizers:
* Optimized-Kernel
* Zero-Byte
* Precompute-Init
* Precompute-Merkle-Demgard
* Meet-In-The-Middle
* Early-Skip
* Not-Salted
* Not-Iterated
* Single-Hash
* Single-Salt
* Raw-Hash
Password length minimum: 0
Password length maximum: 27
Watchdog: Temperature abort trigger set to 90c
Watchdog: Temperature retain trigger disabled.
* Device #1: build_opts '-I /usr/share/hashcat/OpenCL -D VENDOR_ID=32 -D CUDA_ARCH=800 -D AMD_ROCM=0 -D VECT_SIZE=1 -D DEVICE_TYPE=4 -D DGST_R0=0 -D DGST_R1=3 -D DGST_R2=2 -D DGST_R3=1 -D DGST_ELEM=4 -D KERN_TYPE=1000 -D _unroll'
* Device #2: build_opts '-I /usr/share/hashcat/OpenCL -D VENDOR_ID=32 -D CUDA_ARCH=800 -D AMD_ROCM=0 -D VECT_SIZE=1 -D DEVICE_TYPE=4 -D DGST_R0=0 -D DGST_R1=3 -D DGST_R2=2 -D DGST_R3=1 -D DGST_ELEM=4 -D KERN_TYPE=1000 -D _unroll'
* Device #3: build_opts '-I /usr/share/hashcat/OpenCL -D VENDOR_ID=32 -D CUDA_ARCH=800 -D AMD_ROCM=0 -D VECT_SIZE=1 -D DEVICE_TYPE=4 -D DGST_R0=0 -D DGST_R1=3 -D DGST_R2=2 -D DGST_R3=1 -D DGST_ELEM=4 -D KERN_TYPE=1000 -D _unroll'
* Device #4: build_opts '-I /usr/share/hashcat/OpenCL -D VENDOR_ID=32 -D CUDA_ARCH=800 -D AMD_ROCM=0 -D VECT_SIZE=1 -D DEVICE_TYPE=4 -D DGST_R0=0 -D DGST_R1=3 -D DGST_R2=2 -D DGST_R3=1 -D DGST_ELEM=4 -D KERN_TYPE=1000 -D _unroll'
* Device #5: build_opts '-I /usr/share/hashcat/OpenCL -D VENDOR_ID=32 -D CUDA_ARCH=800 -D AMD_ROCM=0 -D VECT_SIZE=1 -D DEVICE_TYPE=4 -D DGST_R0=0 -D DGST_R1=3 -D DGST_R2=2 -D DGST_R3=1 -D DGST_ELEM=4 -D KERN_TYPE=1000 -D _unroll'
* Device #6: build_opts '-I /usr/share/hashcat/OpenCL -D VENDOR_ID=32 -D CUDA_ARCH=800 -D AMD_ROCM=0 -D VECT_SIZE=1 -D DEVICE_TYPE=4 -D DGST_R0=0 -D DGST_R1=3 -D DGST_R2=2 -D DGST_R3=1 -D DGST_ELEM=4 -D KERN_TYPE=1000 -D _unroll'
* Device #7: build_opts '-I /usr/share/hashcat/OpenCL -D VENDOR_ID=32 -D CUDA_ARCH=800 -D AMD_ROCM=0 -D VECT_SIZE=1 -D DEVICE_TYPE=4 -D DGST_R0=0 -D DGST_R1=3 -D DGST_R2=2 -D DGST_R3=1 -D DGST_ELEM=4 -D KERN_TYPE=1000 -D _unroll'
* Device #8: build_opts '-I /usr/share/hashcat/OpenCL -D VENDOR_ID=32 -D CUDA_ARCH=800 -D AMD_ROCM=0 -D VECT_SIZE=1 -D DEVICE_TYPE=4 -D DGST_R0=0 -D DGST_R1=3 -D DGST_R2=2 -D DGST_R3=1 -D DGST_ELEM=4 -D KERN_TYPE=1000 -D _unroll'
Dictionary cache hit:
* Filename..: wordlists/finalweak.txt
* Passwords.: 15639992272
* Bytes.....: 177992863744
* Keyspace..: 15639992272
- Device #4: autotuned kernel-accel to 256
- Device #4: autotuned kernel-loops to 1
- Device #3: autotuned kernel-accel to 256
- Device #3: autotuned kernel-loops to 1
- Device #5: autotuned kernel-accel to 256
- Device #5: autotuned kernel-loops to 1
- Device #1: autotuned kernel-accel to 256
- Device #1: autotuned kernel-loops to 1
- Device #2: autotuned kernel-accel to 256
- Device #2: autotuned kernel-loops to 1
- Device #8: autotuned kernel-accel to 256
- Device #8: autotuned kernel-loops to 1
- Device #6: autotuned kernel-accel to 256
- Device #6: autotuned kernel-loops to 1
- Device #7: autotuned kernel-accel to 256
- Device #7: autotuned kernel-loops to 1
Session..........: hashcat
Status...........: Running
Hash.Type........: NTLM
Hash.Target......:
Time.Started.....: Tue Sep 28 15:32:45 2021 (30 secs)
Time.Estimated...: Tue Sep 28 15:40:02 2021 (6 mins, 47 secs)
Guess.Base.......: File (wordlists/finalweak.txt)
Guess.Queue......: 1/1 (100.00%)
Speed.Dev.#1.....: 4461.3 kH/s (1.92ms)
Speed.Dev.#2.....: 4904.3 kH/s (1.92ms)
Speed.Dev.#3.....: 4222.8 kH/s (1.94ms)
Speed.Dev.#4.....: 4449.5 kH/s (1.95ms)
Speed.Dev.#5.....: 4350.8 kH/s (1.93ms)
Speed.Dev.#6.....: 5675.5 kH/s (1.92ms)
Speed.Dev.#7.....: 4587.4 kH/s (1.92ms)
Speed.Dev.#8.....: 4579.0 kH/s (1.92ms)
Speed.Dev.#*.....: 37230.6 kH/s
Recovered........: 0/1 (0.00%) Digests, 0/1 (0.00%) Salts
Progress.........: 486237586/15639992272 (3.11%)
Rejected.........: 4941202/486237586 (1.02%)
Restore.Point....: 457636015/15639992272 (2.93%)
Candidates.#1....: InFusion121 -> ZRMFY
Candidates.#2....: THEBEATL -> aymhmanaza
Candidates.#3....: PopularZebra8632 -> alex_aria
Candidates.#4....: BNDYQ -> ZRMFIFDY1Ng2iuRl
Candidates.#5....: EJHz -> ZRMFP
Candidates.#6....: 9858457 -> THANHCHUONGPRO
Candidates.#7....: 6zl8doon -> TESIE 1996
Candidates.#8....: MOHsin@ -> THEBEATIFUL
HWMon.Dev.#1.....: Temp: 60c Util: 0% Core:1410MHz Mem:1215MHz Bus:16
HWMon.Dev.#2.....: Temp: 53c Util: 0% Core:1410MHz Mem:1215MHz Bus:16
HWMon.Dev.#3.....: Temp: 65c Util: 0% Core:1410MHz Mem:1215MHz Bus:16
HWMon.Dev.#4.....: Temp: 53c Util: 0% Core:1410MHz Mem:1215MHz Bus:16
HWMon.Dev.#5.....: Temp: 59c Util: 0% Core:1410MHz Mem:1215MHz Bus:16
HWMon.Dev.#6.....: Temp: 52c Util: 0% Core:1410MHz Mem:1215MHz Bus:16
HWMon.Dev.#7.....: Temp: 65c Util: 0% Core:1410MHz Mem:1215MHz Bus:16
HWMon.Dev.#8.....: Temp: 55c Util: 0% Core:1410MHz Mem:1215MHz Bus:16
I tried to find a solution by monitoring the GPU and got this result
Code: [0] A100-SXM4-40GB | 60'C, 0 % | 10860 / 40537 MB | root(10857M)
[1] A100-SXM4-40GB | 53'C, 0 % | 10860 / 40537 MB | root(10857M)
[2] A100-SXM4-40GB | 65'C, 0 % | 10860 / 40537 MB | root(10857M)
[3] A100-SXM4-40GB | 53'C, 0 % | 10860 / 40537 MB | root(10857M)
[4] A100-SXM4-40GB | 59'C, 0 % | 10860 / 40537 MB | root(10857M)
[5] A100-SXM4-40GB | 52'C, 0 % | 10860 / 40537 MB | root(10857M)
[6] A100-SXM4-40GB | 65'C, 0 % | 10860 / 40537 MB | root(10857M)
[7] A100-SXM4-40GB | 55'C, 41 % | 10860 / 40537 MB | root(10857M)
I am wondering why hashcat only use 41% of the last GPU instead using all the GPU at the same time at a full power ?
I appreciate any help, thank you
RE: Massive slow performance from 900 GH/s to 37 MH/s - Xanadrel - 09-28-2021
https://hashcat.net/wiki/doku.php?id=frequently_asked_questions#why_is_my_attack_so_slow
Specifically the part about creating more work.
RE: Massive slow performance from 900 GH/s to 37 MH/s - smozy92 - 09-28-2021
(09-28-2021, 05:42 PM)Xanadrel Wrote: https://hashcat.net/wiki/doku.php?id=frequently_asked_questions#why_is_my_attack_so_slow
Specifically the part about creating more work.
Thanks for the ressource
NTLM must be categorized as a fast hashes
I don't see how mask or rules based attacks could increase the speed
RE: Massive slow performance from 900 GH/s to 37 MH/s - Chick3nman - 09-29-2021
You should probably start by using a version of hashcat that isnt 4 years old. After you update to a more recent version of hashcat and retest, make sure that whatever command you are running doesn't include --force. Once you've got both of those figured out, then you can worry about adding more work or restructuring your attack to achieve a proper workload and better speeds.
RE: Massive slow performance from 900 GH/s to 37 MH/s - smozy92 - 09-30-2021
(09-29-2021, 11:19 AM)Chick3nman Wrote: You should probably start by using a version of hashcat that isnt 4 years old. After you update to a more recent version of hashcat and retest, make sure that whatever command you are running doesn't include --force. Once you've got both of those figured out, then you can worry about adding more work or restructuring your attack to achieve a proper workload and better speeds.
Thanks for the reply chickenman.
I used a 60GB wordlist combined with d3ad0ne rules and got 200 GH/s which is way way way faster than the 37 MH/s with the simple wordlist without rules. Do you have any explanation on why it gets faster with rules, that puzzles me.
However, 200 GH/s is still not the 900 GH/s expected by the benchmark (even if I know that the benchmark runs with the best conditions and optimisation ...), what can I do to increase the speed again ?
RE: Massive slow performance from 900 GH/s to 37 MH/s - Snoopy - 09-30-2021
https://hashcat.net/wiki/doku.php?id=frequently_asked_questions#how_to_create_more_work_for_full_speed
short answer
base loop and mod loop (rules), rules are applied and the gpu resulting in higher hashrates (rules acting as amplifier)
|