Posts: 18 
	Threads: 6 
	Joined: Jun 2023
	
	 
 
	
		
		
		06-25-2023, 02:39 AM 
(This post was last modified: 06-25-2023, 02:41 AM by Gyfer.)
		
	 
	
		Everywhere about hashcat, I read, choose CUDA over OpenCL because is faster in Cuda. 
But my result shows me is not.
 
What is your CUDA and OpenCL speed ??
 
Running OpenCL:
 Code: user@compute2:$ hashcat -b -m 22000 -d 4 
hashcat (v6.2.5) starting in benchmark mode 
 
CUDA API (CUDA 11.4) 
==================== 
* Device #1: NVIDIA GeForce GTX 1660 SUPER, skipped 
 
OpenCL API (OpenCL 3.0 CUDA 11.4.364) - Platform #3 [NVIDIA Corporation] 
======================================================================== 
* Device #4: NVIDIA GeForce GTX 1660 SUPER, 5824/5944 MB (1486 MB allocatable), 22MCU 
 
Benchmark relevant options: 
=========================== 
* --backend-devices=4 
* --optimized-kernel-enable 
 
------------------------------------------------------------- 
* Hash-Mode 22000 (WPA-PBKDF2-PMKID+EAPOL) [Iterations: 4095] 
------------------------------------------------------------- 
 
Speed.#4...test1......:  303.6 kH/s (73.45ms) @ Accel:64 Loops:256 Thr:256 Vec:1 
Speed.#4...test2......:  303.9 kH/s (73.54ms) @ Accel:128 Loops:128 Thr:256 Vec:1 
Speed.#4...test3......:  302.7 kH/s (73.85ms) @ Accel:128 Loops:128 Thr:256 Vec:1 
 
Started: Sun Jun 25 07:22:57 2023 
Stopped: Sun Jun 25 07:23:33 2023 
user@compute2:~$
 
running CIUDA:
 Code: user@compute2:$ hashcat -b -m 22000 -d 1 
hashcat (v6.2.5) starting in benchmark mode 
 
CUDA API (CUDA 11.4) 
==================== 
* Device #1: NVIDIA GeForce GTX 1660 SUPER, 5872/5944 MB, 22MCU 
 
OpenCL API (OpenCL 3.0 CUDA 11.4.364) - Platform #3 [NVIDIA Corporation] 
======================================================================== 
* Device #4: NVIDIA GeForce GTX 1660 SUPER, skipped 
 
Benchmark relevant options: 
=========================== 
* --backend-devices=1 
* --optimized-kernel-enable 
[ 
------------------------------------------------------------- 
* Hash-Mode 22000 (WPA-PBKDF2-PMKID+EAPOL) [Iterations: 4095] 
------------------------------------------------------------- 
 
Speed.#1..test1..:  300.5 kH/s (73.41ms) @ Accel:32 Loops:256 Thr:512 Vec:1 
Speed.#1..test2..:  301.2 kH/s (73.24ms) @ Accel:32 Loops:256 Thr:512 Vec:1 
Speed.#1..test3..:  300.0 kH/s (73.18ms) @ Accel:16 Loops:512 Thr:512 Vec:1 
 
Started: Sun Jun 25 07:31:32 2023 
Stopped: Sun Jun 25 07:31:37 2023 
user@compute2:~$
 
That's is about 1% of different, which I think is quite significant.
	  
	
	
	
	
 
 
	
	
	
		
	Posts: 383 
	Threads: 1 
	Joined: Aug 2020
	
	 
 
	
	
		Your test must be performed on a hash, not a benchmark. Speeds also depend on the type of hash
	 
	
	
	
	
 
 
	
	
	
		
	Posts: 111 
	Threads: 1 
	Joined: Apr 2023
	
	 
 
	
	
		Also you don't have latest hastcat and cuda.
	 
	
	
	
	
 
 
	
	
	
		
	Posts: 111 
	Threads: 1 
	Joined: Apr 2023
	
	 
 
	
	
		 (06-25-2023, 10:46 PM)aikiuslik Wrote:  Also you don't have latest hastcat and cuda. Code: CUDA API (CUDA 12.2) 
==================== 
* Device #1: NVIDIA GeForce RTX 3060, 11261/12287 MB, 28MCU 
 
OpenCL API (OpenCL 3.0 CUDA 12.2.79) - Platform #1 [NVIDIA Corporation] 
======================================================================= 
* Device #2: NVIDIA GeForce RTX 3060, skipped 
 
Benchmark relevant options: 
=========================== 
* --optimized-kernel-enable 
 
------------------------------------------------------------- 
* Hash-Mode 22000 (WPA-PBKDF2-PMKID+EAPOL) [Iterations: 4095] 
------------------------------------------------------------- 
 
Speed.#1.........:  396.8 kH/s (70.27ms) @ Accel:8 Loops:1024 Thr:512 Vec:1 
 
Started: Sun Jun 25 23:49:08 2023 
Stopped: Sun Jun 25 23:49:24 2023 
 
CUDA API (CUDA 12.2) 
==================== 
* Device #1: NVIDIA GeForce RTX 3060, skipped 
 
OpenCL API (OpenCL 3.0 CUDA 12.2.79) - Platform #1 [NVIDIA Corporation] 
======================================================================= 
* Device #2: NVIDIA GeForce RTX 3060, 12160/12287 MB (3071 MB allocatable), 28MCU 
 
Benchmark relevant options: 
=========================== 
* --backend-devices=2 
* --optimized-kernel-enable 
 
------------------------------------------------------------- 
* Hash-Mode 22000 (WPA-PBKDF2-PMKID+EAPOL) [Iterations: 4095] 
------------------------------------------------------------- 
 
Speed.#2.........:  396.5 kH/s (71.25ms) @ Accel:32 Loops:512 Thr:256 Vec:1 
 
Started: Sun Jun 25 23:50:04 2023 
Stopped: Sun Jun 25 23:50:20 2023
  
	 
	
	
	
	
 
 
	
	
	
		
	Posts: 18 
	Threads: 6 
	Joined: Jun 2023
	
	 
 
	
	
		 (06-25-2023, 10:51 PM)aikiuslik Wrote:  [quote="aikiuslik" pid='58485' dateline='1687726001'] 
Also you don't have latest hastcat and cuda. 
According to the changes.txt file in the  Hashcat Documentation in GitHub on GitHub, there doesn't appear to be any improvements or additional benefits in upgrading to version 6.2.6 if my primarily use algorithm is 22000. As a result, there doesn't seem to be a reason or necessity to switch to 6.2.6 if version 6.2.5 is functioning well for me.
 
Regarding the  NVIDIA Cuda documentation, and the  NVIDIA Driver Archive for Unix , considering that I have an Nvidia GT 730 graphics card in my PC (for backward compatibility reason), the latest Nvidia driver available for Linux is version 470.xx. For this driver version (470.xx), the latest compatible CUDA version that can be installed is CUDA 11.4. It's worth noting that newer CUDA versions require newer driver versions.
 
Based on the above, I believe I have the latest "compatible" versions of Hashcat and CUDA for my setup. :-)
 
Now, I have trouble to get RX580 to compute in Hashcat. I won't be trying ROCm since AMD drop the support in ROCm 4.0,  any pointers to get it run in OpenCL with Ubuntu 20.04 Focal Fossa ?
 
Over here,  NVIDIA GTX 1060 still goes around ( forex rate : 4.66) USD75 , and RX580 4GB can get around USD 43.  
GTX 1060 have approx. 22000 speed rate of 186kH/s running in CUDA 11.x  ( ref ) 
For RX580, I believe I can get around 220kH/s running in Windows OpenCL.  
Getting two RX580 for less than USD100 and hashrate of above 400++kH/s is not bad, I think. 
 
AMD driver tried (or trying) : 
1.  https://github.com/RadeonOpenCompute/ROCm
2.  https://github.com/RadeonOpenCompute/ROC...CL-Runtime
1.  https://github.com/xuhuisheng/rocm-gfx803
2.  https://www.videogames.ai/Install-ROCM-M...ng-AMD-GPU
5.  https://github.com/ptitjes/opencl-amd
Even result in cmake build: 
1.  CMake 3.20-custom
2.  OpenCL in Linux
It has been 4 days to get AMD to hash! *meow*
	  
	
	
	
	
 
 
	
	
	
		
	Posts: 111 
	Threads: 1 
	Joined: Apr 2023
	
	 
 
	
	
		Why don't you use windows with cuda? Much easier.
	 
	
	
	
	
 
 
	
	
	
		
	Posts: 18 
	Threads: 6 
	Joined: Jun 2023
	
	 
 
	
		
		
		06-28-2023, 04:28 PM 
(This post was last modified: 06-28-2023, 04:32 PM by Gyfer.)
		
	 
	
		 (06-27-2023, 10:17 AM)aikiuslik Wrote:  Why don't you use windows with cuda? Much easier. 
Too much overhead and headless unfriendly.  At one point, I did dump Windows over Ubuntu because of Docker until I found WSL2. 
 
Anyhow , here is the follow up of CUDA vs OpenCL: 
P4000 seem to run a lot slower in OpenCL, other than that, there're improvement across GTX 1660 Super and GT730.
 
Here the benchmark. Did a few runs, pick up the best speed among it. 
 Code: hashcat (v6.2.5) starting in benchmark mode 
* Device #7: This hardware has outdated CUDA compute capability (3.5). 
            For modern OpenCL performance, upgrade to hardware that supports 
            CUDA compute capability version 5.0 (Maxwell) or higher. 
 
CUDA API (CUDA 11.4) 
==================== 
* Device #1: NVIDIA GeForce GTX 1660 SUPER, skipped 
* Device #2: Quadro P4000, 8038/8119 MB, 14MCU 
* Device #3: NVIDIA GeForce GT 730, skipped 
 
OpenCL API (OpenCL 3.0 PoCL 3.0-rc2  Linux, RelWithDebInfo, RELOC, SPIR, LLVM 10.0.0, SLEEF, POCL_DEBUG) - Platform #1 [The pocl project] 
========================================================================================================================================= 
* Device #4: pthread-Intel(R) Core(TM) i3-4130 CPU @ 3.40GHz, skipped 
 
OpenCL API (OpenCL 2.0 AMD-APP (3314.0)) - Platform #2 [Advanced Micro Devices, Inc.] 
===================================================================================== 
 
OpenCL API (OpenCL 3.0 CUDA 11.4.402) - Platform #3 [NVIDIA Corporation] 
======================================================================== 
* Device #5: NVIDIA GeForce GTX 1660 SUPER, 5824/5944 MB (1486 MB allocatable), 22MCU 
* Device #6: Quadro P4000, skipped 
* Device #7: NVIDIA GeForce GT 730, 1920/2002 MB (500 MB allocatable), 2MCU 
 
Benchmark relevant options: 
=========================== 
* --backend-devices=2,5,7 
* --optimized-kernel-enable 
 
------------------------------------------------------------- 
* Hash-Mode 22000 (WPA-PBKDF2-PMKID+EAPOL) [Iterations: 4095] 
------------------------------------------------------------- 
 
Speed.#1.........:  300.0 kH/s (73.88ms) @ Accel:8 Loops:1024 Thr:512 Vec:1 
Speed.#2.........:  283.2 kH/s (49.96ms) @ Accel:16 Loops:512 Thr:512 Vec:1 
Speed.#3.........:    11150 H/s (80.96ms) @ Accel:8 Loops:1024 Thr:256 Vec:1 
Speed.#*.........:  594.4 kH/s 
 
Speed.#1.........:  300.2 kH/s (73.82ms) @ Accel:64 Loops:256 Thr:256 Vec:1 
Speed.#2.........:  280.8 kH/s (50.35ms) @ Accel:32 Loops:256 Thr:512 Vec:1 
Speed.#3.........:    12257 H/s (77.74ms) @ Accel:16 Loops:512 Thr:256 Vec:1 
Speed.#*.........:  593.3 kH/s 
 
Speed.#1.........:  299.2 kH/s (74.07ms) @ Accel:64 Loops:256 Thr:256 Vec:1 
Speed.#2.........:  285.2 kH/s (49.55ms) @ Accel:8 Loops:1024 Thr:512 Vec:1 
Speed.#3.........:    11286 H/s (79.98ms) @ Accel:32 Loops:1024 Thr:64 Vec:1 
Speed.#*.........:  595.7 kH/s 
 
 
Speed.#5.........:  302.4 kH/s (74.11ms) @ Accel:128 Loops:128 Thr:256 Vec:1 
Speed.#6.........:  262.5 kH/s (54.25ms) @ Accel:128 Loops:128 Thr:256 Vec:1 
Speed.#7.........:    12131 H/s (79.19ms) @ Accel:8 Loops:1024 Thr:256 Vec:1 
Speed.#*.........:  577.0 kH/s 
 
Speed.#5.........:  302.3 kH/s (74.14ms) @ Accel:128 Loops:128 Thr:256 Vec:1 
Speed.#6.........:  261.4 kH/s (54.47ms) @ Accel:128 Loops:128 Thr:256 Vec:1 
Speed.#7.........:    13573 H/s (74.37ms) @ Accel:64 Loops:128 Thr:256 Vec:1 
Speed.#*.........:  577.2 kH/s 
 
Speed.#5.........:  302.1 kH/s (74.17ms) @ Accel:128 Loops:128 Thr:256 Vec:1 
Speed.#6.........:  262.1 kH/s (54.33ms) @ Accel:128 Loops:128 Thr:256 Vec:1 
Speed.#7.........:    13458 H/s (74.41ms) @ Accel:32 Loops:256 Thr:256 Vec:1 
Speed.#*.........:  577.7 kH/s 
 
 
 
Speed.#5.........:  302.3 kH/s (74.13ms) @ Accel:128 Loops:128 Thr:256 Vec:1 
Speed.#2.........:  288.2 kH/s (49.22ms) @ Accel:8 Loops:1024 Thr:512 Vec:1 
Speed.#7.........:    12125 H/s (79.22ms) @ Accel:8 Loops:1024 Thr:256 Vec:1 
Speed.#*.........:  602.6 kH/s 
 
Speed.#5.........:  302.1 kH/s (74.18ms) @ Accel:64 Loops:256 Thr:256 Vec:1 
Speed.#2.........:  286.6 kH/s (49.50ms) @ Accel:8 Loops:1024 Thr:512 Vec:1 
Speed.#7.........:    13458 H/s (74.43ms) @ Accel:32 Loops:256 Thr:256 Vec:1 
Speed.#*.........:  602.2 kH/s 
 
Speed.#5.........:  302.3 kH/s (74.13ms) @ Accel:128 Loops:128 Thr:256 Vec:1 
Speed.#2.........:  286.9 kH/s (49.48ms) @ Accel:8 Loops:1024 Thr:512 Vec:1 
Speed.#7.........:    13570 H/s (74.56ms) @ Accel:128 Loops:64 Thr:256 Vec:1 
Speed.#*.........:  602.7 kH/s
  
	 
	
	
	
	
 
 
	 
 |