hashcat
advanced password recovery

smozy92 · (This post was last modified: 09-28-2021, 05:38 PM by smozy92.)

Hello Evyerone, hope you are doing well Smile

I am getting a massive slow performance on my crackstation with hashcat.

Indeed, I am running a crackstation with 8x Tesla Nvidia A100 GPU.

Here is what I get when running a benchmark for NTLM hashes

Code:
sudo time hashcat -a0 -m 1000 hashes/a.txt wordlists/finalweak.txt -O --force -w 4

Code:
Hashmode: 1000 - NTLM

Speed.#1.........:  116.1 GH/s (7.67ms) @ Accel:32 Loops:1024 Thr:256 Vec:1

Speed.#2.........:  116.0 GH/s (7.67ms) @ Accel:32 Loops:1024 Thr:256 Vec:1

Speed.#3.........:  116.0 GH/s (7.67ms) @ Accel:32 Loops:1024 Thr:256 Vec:1

Speed.#4.........:  116.0 GH/s (7.67ms) @ Accel:32 Loops:1024 Thr:256 Vec:1

Speed.#5.........:  116.0 GH/s (7.67ms) @ Accel:32 Loops:1024 Thr:256 Vec:1

Speed.#6.........:  116.0 GH/s (7.67ms) @ Accel:32 Loops:1024 Thr:256 Vec:1

Speed.#7.........:  116.0 GH/s (7.67ms) @ Accel:32 Loops:1024 Thr:256 Vec:1

Speed.#8.........:  116.0 GH/s (7.67ms) @ Accel:32 Loops:1024 Thr:256 Vec:1

Speed.#*.........:  928.2 GH/s

However, when trying to crack a single NTLM hash I don't get this power, I only get about 37230.6 kH/s. Since the benchmark said 928 GH/s it's a bit weird to only get 37230.6 kH/s

Here is the output I got when running this command :

Code:
hashcat (v4.0.1) starting...

nvmlDeviceGetFanSpeed(): Not Supported

nvmlDeviceGetFanSpeed(): Not Supported

nvmlDeviceGetFanSpeed(): Not Supported

nvmlDeviceGetFanSpeed(): Not Supported

nvmlDeviceGetFanSpeed(): Not Supported

nvmlDeviceGetFanSpeed(): Not Supported

nvmlDeviceGetFanSpeed(): Not Supported

nvmlDeviceGetFanSpeed(): Not Supported

OpenCL Platform #1: NVIDIA Corporation

======================================

* Device #1: A100-SXM4-40GB, 10134/40537 MB allocatable, 108MCU

* Device #2: A100-SXM4-40GB, 10134/40537 MB allocatable, 108MCU

* Device #3: A100-SXM4-40GB, 10134/40537 MB allocatable, 108MCU

* Device #4: A100-SXM4-40GB, 10134/40537 MB allocatable, 108MCU

* Device #5: A100-SXM4-40GB, 10134/40537 MB allocatable, 108MCU

* Device #6: A100-SXM4-40GB, 10134/40537 MB allocatable, 108MCU

* Device #7: A100-SXM4-40GB, 10134/40537 MB allocatable, 108MCU

* Device #8: A100-SXM4-40GB, 10134/40537 MB allocatable, 108MCU

OpenCL Platform #2: The pocl project

====================================

* Device #9: pthread-Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz, skipped.

Hashes: 1 digests; 1 unique digests, 1 unique salts

Bitmaps: 16 bits, 65536 entries, 0x0000ffff mask, 262144 bytes, 5/13 rotates

Rules: 1

Applicable optimizers:

* Optimized-Kernel

* Zero-Byte

* Precompute-Init

* Precompute-Merkle-Demgard

* Meet-In-The-Middle

* Early-Skip

* Not-Salted

* Not-Iterated

* Single-Hash

* Single-Salt

* Raw-Hash

Password length minimum: 0

Password length maximum: 27

Watchdog: Temperature abort trigger set to 90c

Watchdog: Temperature retain trigger disabled.

* Device #1: build_opts '-I /usr/share/hashcat/OpenCL -D VENDOR_ID=32 -D CUDA_ARCH=800 -D AMD_ROCM=0 -D VECT_SIZE=1 -D DEVICE_TYPE=4 -D DGST_R0=0 -D DGST_R1=3 -D DGST_R2=2 -D DGST_R3=1 -D DGST_ELEM=4 -D KERN_TYPE=1000 -D _unroll'

* Device #2: build_opts '-I /usr/share/hashcat/OpenCL -D VENDOR_ID=32 -D CUDA_ARCH=800 -D AMD_ROCM=0 -D VECT_SIZE=1 -D DEVICE_TYPE=4 -D DGST_R0=0 -D DGST_R1=3 -D DGST_R2=2 -D DGST_R3=1 -D DGST_ELEM=4 -D KERN_TYPE=1000 -D _unroll'

* Device #3: build_opts '-I /usr/share/hashcat/OpenCL -D VENDOR_ID=32 -D CUDA_ARCH=800 -D AMD_ROCM=0 -D VECT_SIZE=1 -D DEVICE_TYPE=4 -D DGST_R0=0 -D DGST_R1=3 -D DGST_R2=2 -D DGST_R3=1 -D DGST_ELEM=4 -D KERN_TYPE=1000 -D _unroll'

* Device #4: build_opts '-I /usr/share/hashcat/OpenCL -D VENDOR_ID=32 -D CUDA_ARCH=800 -D AMD_ROCM=0 -D VECT_SIZE=1 -D DEVICE_TYPE=4 -D DGST_R0=0 -D DGST_R1=3 -D DGST_R2=2 -D DGST_R3=1 -D DGST_ELEM=4 -D KERN_TYPE=1000 -D _unroll'

* Device #5: build_opts '-I /usr/share/hashcat/OpenCL -D VENDOR_ID=32 -D CUDA_ARCH=800 -D AMD_ROCM=0 -D VECT_SIZE=1 -D DEVICE_TYPE=4 -D DGST_R0=0 -D DGST_R1=3 -D DGST_R2=2 -D DGST_R3=1 -D DGST_ELEM=4 -D KERN_TYPE=1000 -D _unroll'

* Device #6: build_opts '-I /usr/share/hashcat/OpenCL -D VENDOR_ID=32 -D CUDA_ARCH=800 -D AMD_ROCM=0 -D VECT_SIZE=1 -D DEVICE_TYPE=4 -D DGST_R0=0 -D DGST_R1=3 -D DGST_R2=2 -D DGST_R3=1 -D DGST_ELEM=4 -D KERN_TYPE=1000 -D _unroll'

* Device #7: build_opts '-I /usr/share/hashcat/OpenCL -D VENDOR_ID=32 -D CUDA_ARCH=800 -D AMD_ROCM=0 -D VECT_SIZE=1 -D DEVICE_TYPE=4 -D DGST_R0=0 -D DGST_R1=3 -D DGST_R2=2 -D DGST_R3=1 -D DGST_ELEM=4 -D KERN_TYPE=1000 -D _unroll'

* Device #8: build_opts '-I /usr/share/hashcat/OpenCL -D VENDOR_ID=32 -D CUDA_ARCH=800 -D AMD_ROCM=0 -D VECT_SIZE=1 -D DEVICE_TYPE=4 -D DGST_R0=0 -D DGST_R1=3 -D DGST_R2=2 -D DGST_R3=1 -D DGST_ELEM=4 -D KERN_TYPE=1000 -D _unroll'

Dictionary cache hit:

* Filename..: wordlists/finalweak.txt

* Passwords.: 15639992272

* Bytes.....: 177992863744

* Keyspace..: 15639992272

- Device #4: autotuned kernel-accel to 256                

- Device #4: autotuned kernel-loops to 1

- Device #3: autotuned kernel-accel to 256                

- Device #3: autotuned kernel-loops to 1

- Device #5: autotuned kernel-accel to 256                

- Device #5: autotuned kernel-loops to 1

- Device #1: autotuned kernel-accel to 256                

- Device #1: autotuned kernel-loops to 1

- Device #2: autotuned kernel-accel to 256                

- Device #2: autotuned kernel-loops to 1

- Device #8: autotuned kernel-accel to 256                

- Device #8: autotuned kernel-loops to 1

- Device #6: autotuned kernel-accel to 256                

- Device #6: autotuned kernel-loops to 1

- Device #7: autotuned kernel-accel to 256                

- Device #7: autotuned kernel-loops to 1

Session..........: hashcat

Status...........: Running

Hash.Type........: NTLM

Hash.Target......: 

Time.Started.....: Tue Sep 28 15:32:45 2021 (30 secs)

Time.Estimated...: Tue Sep 28 15:40:02 2021 (6 mins, 47 secs)

Guess.Base.......: File (wordlists/finalweak.txt)

Guess.Queue......: 1/1 (100.00%)

Speed.Dev.#1.....:  4461.3 kH/s (1.92ms)

Speed.Dev.#2.....:  4904.3 kH/s (1.92ms)

Speed.Dev.#3.....:  4222.8 kH/s (1.94ms)

Speed.Dev.#4.....:  4449.5 kH/s (1.95ms)

Speed.Dev.#5.....:  4350.8 kH/s (1.93ms)

Speed.Dev.#6.....:  5675.5 kH/s (1.92ms)

Speed.Dev.#7.....:  4587.4 kH/s (1.92ms)

Speed.Dev.#8.....:  4579.0 kH/s (1.92ms)

Speed.Dev.#*.....: 37230.6 kH/s

Recovered........: 0/1 (0.00%) Digests, 0/1 (0.00%) Salts

Progress.........: 486237586/15639992272 (3.11%)

Rejected.........: 4941202/486237586 (1.02%)

Restore.Point....: 457636015/15639992272 (2.93%)

Candidates.#1....: InFusion121 -> ZRMFY

Candidates.#2....: THEBEATL -> aymhmanaza

Candidates.#3....: PopularZebra8632 -> alex_aria

Candidates.#4....: BNDYQ -> ZRMFIFDY1Ng2iuRl

Candidates.#5....: EJHz -> ZRMFP

Candidates.#6....: 9858457 -> THANHCHUONGPRO

Candidates.#7....: 6zl8doon -> TESIE 1996

Candidates.#8....: MOHsin@ -> THEBEATIFUL

HWMon.Dev.#1.....: Temp: 60c Util:  0% Core:1410MHz Mem:1215MHz Bus:16

HWMon.Dev.#2.....: Temp: 53c Util:  0% Core:1410MHz Mem:1215MHz Bus:16

HWMon.Dev.#3.....: Temp: 65c Util:  0% Core:1410MHz Mem:1215MHz Bus:16

HWMon.Dev.#4.....: Temp: 53c Util:  0% Core:1410MHz Mem:1215MHz Bus:16

HWMon.Dev.#5.....: Temp: 59c Util:  0% Core:1410MHz Mem:1215MHz Bus:16

HWMon.Dev.#6.....: Temp: 52c Util:  0% Core:1410MHz Mem:1215MHz Bus:16

HWMon.Dev.#7.....: Temp: 65c Util:  0% Core:1410MHz Mem:1215MHz Bus:16

HWMon.Dev.#8.....: Temp: 55c Util:  0% Core:1410MHz Mem:1215MHz Bus:16

I tried to find a solution by monitoring the GPU and got this result

Code:
[0] A100-SXM4-40GB  | 60'C,  0 % | 10860 / 40537 MB | root(10857M)

[1] A100-SXM4-40GB  | 53'C,  0 % | 10860 / 40537 MB | root(10857M)

[2] A100-SXM4-40GB  | 65'C,  0 % | 10860 / 40537 MB | root(10857M)

[3] A100-SXM4-40GB  | 53'C,  0 % | 10860 / 40537 MB | root(10857M)

[4] A100-SXM4-40GB  | 59'C,  0 % | 10860 / 40537 MB | root(10857M)

[5] A100-SXM4-40GB  | 52'C,  0 % | 10860 / 40537 MB | root(10857M)

[6] A100-SXM4-40GB  | 65'C,  0 % | 10860 / 40537 MB | root(10857M)

[7] A100-SXM4-40GB  | 55'C,  41 % | 10860 / 40537 MB | root(10857M)

I am wondering why hashcat only use 41% of the last GPU instead using all the GPU at the same time at a full power ?

I appreciate any help, thank you Smile

**Xanadrel** · 09-28-2021, 05:42 PM

https://hashcat.net/wiki/doku.php?id=fre...ck_so_slow

Specifically the part about creating more work.

smozy92 · 09-28-2021, 05:52 PM

(09-28-2021, 05:42 PM)Xanadrel Wrote: https://hashcat.net/wiki/doku.php?id=fre...ck_so_slow

Specifically the part about creating more work.

Thanks for the ressource
NTLM must be categorized as a fast hashes

I don't see how mask or rules based attacks could increase the speed

**Chick3nman** · 09-29-2021, 11:19 AM

You should probably start by using a version of hashcat that isnt 4 years old. After you update to a more recent version of hashcat and retest, make sure that whatever command you are running doesn't include --force. Once you've got both of those figured out, then you can worry about adding more work or restructuring your attack to achieve a proper workload and better speeds.

smozy92 · 09-30-2021, 12:34 AM

(09-29-2021, 11:19 AM)Chick3nman Wrote: You should probably start by using a version of hashcat that isnt 4 years old. After you update to a more recent version of hashcat and retest, make sure that whatever command you are running doesn't include --force. Once you've got both of those figured out, then you can worry about adding more work or restructuring your attack to achieve a proper workload and better speeds.

Thanks for the reply chickenman.

I used a 60GB wordlist combined with d3ad0ne rules and got 200 GH/s which is way way way faster than the 37 MH/s with the simple wordlist without rules. Do you have any explanation on why it gets faster with rules, that puzzles me.

However, 200 GH/s is still not the 900 GH/s expected by the benchmark (even if I know that the benchmark runs with the best conditions and optimisation ...), what can I do to increase the speed again ?

Snoopy · 09-30-2021, 01:11 PM

https://hashcat.net/wiki/doku.php?id=fre...full_speed
short answer
base loop and mod loop (rules), rules are applied and the gpu resulting in higher hashrates (rules acting as amplifier)

Login
Username/Email:
Password:	Lost Password?
	Remember me

hashcat advanced password recovery

hashcat
advanced password recovery