Posts: 18
Threads: 6
Joined: Jun 2023
06-25-2023, 02:39 AM
(This post was last modified: 06-25-2023, 02:41 AM by Gyfer.)
Everywhere about hashcat, I read, choose CUDA over OpenCL because is faster in Cuda.
But my result shows me is not.
What is your CUDA and OpenCL speed ??
Running OpenCL:
Code: user@compute2:$ hashcat -b -m 22000 -d 4
hashcat (v6.2.5) starting in benchmark mode
CUDA API (CUDA 11.4)
====================
* Device #1: NVIDIA GeForce GTX 1660 SUPER, skipped
OpenCL API (OpenCL 3.0 CUDA 11.4.364) - Platform #3 [NVIDIA Corporation]
========================================================================
* Device #4: NVIDIA GeForce GTX 1660 SUPER, 5824/5944 MB (1486 MB allocatable), 22MCU
Benchmark relevant options:
===========================
* --backend-devices=4
* --optimized-kernel-enable
-------------------------------------------------------------
* Hash-Mode 22000 (WPA-PBKDF2-PMKID+EAPOL) [Iterations: 4095]
-------------------------------------------------------------
Speed.#4...test1......: 303.6 kH/s (73.45ms) @ Accel:64 Loops:256 Thr:256 Vec:1
Speed.#4...test2......: 303.9 kH/s (73.54ms) @ Accel:128 Loops:128 Thr:256 Vec:1
Speed.#4...test3......: 302.7 kH/s (73.85ms) @ Accel:128 Loops:128 Thr:256 Vec:1
Started: Sun Jun 25 07:22:57 2023
Stopped: Sun Jun 25 07:23:33 2023
user@compute2:~$
running CIUDA:
Code: user@compute2:$ hashcat -b -m 22000 -d 1
hashcat (v6.2.5) starting in benchmark mode
CUDA API (CUDA 11.4)
====================
* Device #1: NVIDIA GeForce GTX 1660 SUPER, 5872/5944 MB, 22MCU
OpenCL API (OpenCL 3.0 CUDA 11.4.364) - Platform #3 [NVIDIA Corporation]
========================================================================
* Device #4: NVIDIA GeForce GTX 1660 SUPER, skipped
Benchmark relevant options:
===========================
* --backend-devices=1
* --optimized-kernel-enable
[
-------------------------------------------------------------
* Hash-Mode 22000 (WPA-PBKDF2-PMKID+EAPOL) [Iterations: 4095]
-------------------------------------------------------------
Speed.#1..test1..: 300.5 kH/s (73.41ms) @ Accel:32 Loops:256 Thr:512 Vec:1
Speed.#1..test2..: 301.2 kH/s (73.24ms) @ Accel:32 Loops:256 Thr:512 Vec:1
Speed.#1..test3..: 300.0 kH/s (73.18ms) @ Accel:16 Loops:512 Thr:512 Vec:1
Started: Sun Jun 25 07:31:32 2023
Stopped: Sun Jun 25 07:31:37 2023
user@compute2:~$
That's is about 1% of different, which I think is quite significant.
Posts: 385
Threads: 1
Joined: Aug 2020
Your test must be performed on a hash, not a benchmark. Speeds also depend on the type of hash
Posts: 97
Threads: 1
Joined: Apr 2023
Also you don't have latest hastcat and cuda.
Posts: 97
Threads: 1
Joined: Apr 2023
(06-25-2023, 10:46 PM)aikiuslik Wrote: Also you don't have latest hastcat and cuda. Code: CUDA API (CUDA 12.2)
====================
* Device #1: NVIDIA GeForce RTX 3060, 11261/12287 MB, 28MCU
OpenCL API (OpenCL 3.0 CUDA 12.2.79) - Platform #1 [NVIDIA Corporation]
=======================================================================
* Device #2: NVIDIA GeForce RTX 3060, skipped
Benchmark relevant options:
===========================
* --optimized-kernel-enable
-------------------------------------------------------------
* Hash-Mode 22000 (WPA-PBKDF2-PMKID+EAPOL) [Iterations: 4095]
-------------------------------------------------------------
Speed.#1.........: 396.8 kH/s (70.27ms) @ Accel:8 Loops:1024 Thr:512 Vec:1
Started: Sun Jun 25 23:49:08 2023
Stopped: Sun Jun 25 23:49:24 2023
CUDA API (CUDA 12.2)
====================
* Device #1: NVIDIA GeForce RTX 3060, skipped
OpenCL API (OpenCL 3.0 CUDA 12.2.79) - Platform #1 [NVIDIA Corporation]
=======================================================================
* Device #2: NVIDIA GeForce RTX 3060, 12160/12287 MB (3071 MB allocatable), 28MCU
Benchmark relevant options:
===========================
* --backend-devices=2
* --optimized-kernel-enable
-------------------------------------------------------------
* Hash-Mode 22000 (WPA-PBKDF2-PMKID+EAPOL) [Iterations: 4095]
-------------------------------------------------------------
Speed.#2.........: 396.5 kH/s (71.25ms) @ Accel:32 Loops:512 Thr:256 Vec:1
Started: Sun Jun 25 23:50:04 2023
Stopped: Sun Jun 25 23:50:20 2023
Posts: 18
Threads: 6
Joined: Jun 2023
(06-25-2023, 10:51 PM)aikiuslik Wrote: [quote="aikiuslik" pid='58485' dateline='1687726001']
Also you don't have latest hastcat and cuda.
According to the changes.txt file in the Hashcat Documentation in GitHub on GitHub, there doesn't appear to be any improvements or additional benefits in upgrading to version 6.2.6 if my primarily use algorithm is 22000. As a result, there doesn't seem to be a reason or necessity to switch to 6.2.6 if version 6.2.5 is functioning well for me.
Regarding the NVIDIA Cuda documentation, and the NVIDIA Driver Archive for Unix , considering that I have an Nvidia GT 730 graphics card in my PC (for backward compatibility reason), the latest Nvidia driver available for Linux is version 470.xx. For this driver version (470.xx), the latest compatible CUDA version that can be installed is CUDA 11.4. It's worth noting that newer CUDA versions require newer driver versions.
Based on the above, I believe I have the latest "compatible" versions of Hashcat and CUDA for my setup. :-)
Now, I have trouble to get RX580 to compute in Hashcat. I won't be trying ROCm since AMD drop the support in ROCm 4.0, any pointers to get it run in OpenCL with Ubuntu 20.04 Focal Fossa ?
Over here, NVIDIA GTX 1060 still goes around ( forex rate : 4.66) USD75 , and RX580 4GB can get around USD 43.
GTX 1060 have approx. 22000 speed rate of 186kH/s running in CUDA 11.x ( ref )
For RX580, I believe I can get around 220kH/s running in Windows OpenCL.
Getting two RX580 for less than USD100 and hashrate of above 400++kH/s is not bad, I think.
AMD driver tried (or trying) :
1. https://github.com/RadeonOpenCompute/ROCm
2. https://github.com/RadeonOpenCompute/ROC...CL-Runtime
1. https://github.com/xuhuisheng/rocm-gfx803
2. https://www.videogames.ai/Install-ROCM-M...ng-AMD-GPU
5. https://github.com/ptitjes/opencl-amd
Even result in cmake build:
1. CMake 3.20-custom
2. OpenCL in Linux
It has been 4 days to get AMD to hash! *meow*
Posts: 97
Threads: 1
Joined: Apr 2023
Why don't you use windows with cuda? Much easier.
Posts: 18
Threads: 6
Joined: Jun 2023
06-28-2023, 04:28 PM
(This post was last modified: 06-28-2023, 04:32 PM by Gyfer.)
(06-27-2023, 10:17 AM)aikiuslik Wrote: Why don't you use windows with cuda? Much easier.
Too much overhead and headless unfriendly. At one point, I did dump Windows over Ubuntu because of Docker until I found WSL2.
Anyhow , here is the follow up of CUDA vs OpenCL:
P4000 seem to run a lot slower in OpenCL, other than that, there're improvement across GTX 1660 Super and GT730.
Here the benchmark. Did a few runs, pick up the best speed among it.
Code: hashcat (v6.2.5) starting in benchmark mode
* Device #7: This hardware has outdated CUDA compute capability (3.5).
For modern OpenCL performance, upgrade to hardware that supports
CUDA compute capability version 5.0 (Maxwell) or higher.
CUDA API (CUDA 11.4)
====================
* Device #1: NVIDIA GeForce GTX 1660 SUPER, skipped
* Device #2: Quadro P4000, 8038/8119 MB, 14MCU
* Device #3: NVIDIA GeForce GT 730, skipped
OpenCL API (OpenCL 3.0 PoCL 3.0-rc2 Linux, RelWithDebInfo, RELOC, SPIR, LLVM 10.0.0, SLEEF, POCL_DEBUG) - Platform #1 [The pocl project]
=========================================================================================================================================
* Device #4: pthread-Intel(R) Core(TM) i3-4130 CPU @ 3.40GHz, skipped
OpenCL API (OpenCL 2.0 AMD-APP (3314.0)) - Platform #2 [Advanced Micro Devices, Inc.]
=====================================================================================
OpenCL API (OpenCL 3.0 CUDA 11.4.402) - Platform #3 [NVIDIA Corporation]
========================================================================
* Device #5: NVIDIA GeForce GTX 1660 SUPER, 5824/5944 MB (1486 MB allocatable), 22MCU
* Device #6: Quadro P4000, skipped
* Device #7: NVIDIA GeForce GT 730, 1920/2002 MB (500 MB allocatable), 2MCU
Benchmark relevant options:
===========================
* --backend-devices=2,5,7
* --optimized-kernel-enable
-------------------------------------------------------------
* Hash-Mode 22000 (WPA-PBKDF2-PMKID+EAPOL) [Iterations: 4095]
-------------------------------------------------------------
Speed.#1.........: 300.0 kH/s (73.88ms) @ Accel:8 Loops:1024 Thr:512 Vec:1
Speed.#2.........: 283.2 kH/s (49.96ms) @ Accel:16 Loops:512 Thr:512 Vec:1
Speed.#3.........: 11150 H/s (80.96ms) @ Accel:8 Loops:1024 Thr:256 Vec:1
Speed.#*.........: 594.4 kH/s
Speed.#1.........: 300.2 kH/s (73.82ms) @ Accel:64 Loops:256 Thr:256 Vec:1
Speed.#2.........: 280.8 kH/s (50.35ms) @ Accel:32 Loops:256 Thr:512 Vec:1
Speed.#3.........: 12257 H/s (77.74ms) @ Accel:16 Loops:512 Thr:256 Vec:1
Speed.#*.........: 593.3 kH/s
Speed.#1.........: 299.2 kH/s (74.07ms) @ Accel:64 Loops:256 Thr:256 Vec:1
Speed.#2.........: 285.2 kH/s (49.55ms) @ Accel:8 Loops:1024 Thr:512 Vec:1
Speed.#3.........: 11286 H/s (79.98ms) @ Accel:32 Loops:1024 Thr:64 Vec:1
Speed.#*.........: 595.7 kH/s
Speed.#5.........: 302.4 kH/s (74.11ms) @ Accel:128 Loops:128 Thr:256 Vec:1
Speed.#6.........: 262.5 kH/s (54.25ms) @ Accel:128 Loops:128 Thr:256 Vec:1
Speed.#7.........: 12131 H/s (79.19ms) @ Accel:8 Loops:1024 Thr:256 Vec:1
Speed.#*.........: 577.0 kH/s
Speed.#5.........: 302.3 kH/s (74.14ms) @ Accel:128 Loops:128 Thr:256 Vec:1
Speed.#6.........: 261.4 kH/s (54.47ms) @ Accel:128 Loops:128 Thr:256 Vec:1
Speed.#7.........: 13573 H/s (74.37ms) @ Accel:64 Loops:128 Thr:256 Vec:1
Speed.#*.........: 577.2 kH/s
Speed.#5.........: 302.1 kH/s (74.17ms) @ Accel:128 Loops:128 Thr:256 Vec:1
Speed.#6.........: 262.1 kH/s (54.33ms) @ Accel:128 Loops:128 Thr:256 Vec:1
Speed.#7.........: 13458 H/s (74.41ms) @ Accel:32 Loops:256 Thr:256 Vec:1
Speed.#*.........: 577.7 kH/s
Speed.#5.........: 302.3 kH/s (74.13ms) @ Accel:128 Loops:128 Thr:256 Vec:1
Speed.#2.........: 288.2 kH/s (49.22ms) @ Accel:8 Loops:1024 Thr:512 Vec:1
Speed.#7.........: 12125 H/s (79.22ms) @ Accel:8 Loops:1024 Thr:256 Vec:1
Speed.#*.........: 602.6 kH/s
Speed.#5.........: 302.1 kH/s (74.18ms) @ Accel:64 Loops:256 Thr:256 Vec:1
Speed.#2.........: 286.6 kH/s (49.50ms) @ Accel:8 Loops:1024 Thr:512 Vec:1
Speed.#7.........: 13458 H/s (74.43ms) @ Accel:32 Loops:256 Thr:256 Vec:1
Speed.#*.........: 602.2 kH/s
Speed.#5.........: 302.3 kH/s (74.13ms) @ Accel:128 Loops:128 Thr:256 Vec:1
Speed.#2.........: 286.9 kH/s (49.48ms) @ Accel:8 Loops:1024 Thr:512 Vec:1
Speed.#7.........: 13570 H/s (74.56ms) @ Accel:128 Loops:64 Thr:256 Vec:1
Speed.#*.........: 602.7 kH/s
|