Is using CUDA runtime always prefer over OpenCL ??
#7
(06-27-2023, 10:17 AM)aikiuslik Wrote: Why don't you use windows with cuda? Much easier.

Too much overhead and headless unfriendly.  At one point, I did dump Windows over Ubuntu because of Docker until I found WSL2. 

Anyhow , here is the follow up of CUDA vs OpenCL:
P4000 seem to run a lot slower in OpenCL, other than that, there're improvement across GTX 1660 Super and GT730.

Here the benchmark. Did a few runs, pick up the best speed among it. 

Code:
hashcat (v6.2.5) starting in benchmark mode
* Device #7: This hardware has outdated CUDA compute capability (3.5).
            For modern OpenCL performance, upgrade to hardware that supports
            CUDA compute capability version 5.0 (Maxwell) or higher.

CUDA API (CUDA 11.4)
====================
* Device #1: NVIDIA GeForce GTX 1660 SUPER, skipped
* Device #2: Quadro P4000, 8038/8119 MB, 14MCU
* Device #3: NVIDIA GeForce GT 730, skipped

OpenCL API (OpenCL 3.0 PoCL 3.0-rc2  Linux, RelWithDebInfo, RELOC, SPIR, LLVM 10.0.0, SLEEF, POCL_DEBUG) - Platform #1 [The pocl project]
=========================================================================================================================================
* Device #4: pthread-Intel(R) Core(TM) i3-4130 CPU @ 3.40GHz, skipped

OpenCL API (OpenCL 2.0 AMD-APP (3314.0)) - Platform #2 [Advanced Micro Devices, Inc.]
=====================================================================================

OpenCL API (OpenCL 3.0 CUDA 11.4.402) - Platform #3 [NVIDIA Corporation]
========================================================================
* Device #5: NVIDIA GeForce GTX 1660 SUPER, 5824/5944 MB (1486 MB allocatable), 22MCU
* Device #6: Quadro P4000, skipped
* Device #7: NVIDIA GeForce GT 730, 1920/2002 MB (500 MB allocatable), 2MCU

Benchmark relevant options:
===========================
* --backend-devices=2,5,7
* --optimized-kernel-enable

-------------------------------------------------------------
* Hash-Mode 22000 (WPA-PBKDF2-PMKID+EAPOL) [Iterations: 4095]
-------------------------------------------------------------

Speed.#1.........:  300.0 kH/s (73.88ms) @ Accel:8 Loops:1024 Thr:512 Vec:1
Speed.#2.........:  283.2 kH/s (49.96ms) @ Accel:16 Loops:512 Thr:512 Vec:1
Speed.#3.........:    11150 H/s (80.96ms) @ Accel:8 Loops:1024 Thr:256 Vec:1
Speed.#*.........:  594.4 kH/s

Speed.#1.........:  300.2 kH/s (73.82ms) @ Accel:64 Loops:256 Thr:256 Vec:1
Speed.#2.........:  280.8 kH/s (50.35ms) @ Accel:32 Loops:256 Thr:512 Vec:1
Speed.#3.........:    12257 H/s (77.74ms) @ Accel:16 Loops:512 Thr:256 Vec:1
Speed.#*.........:  593.3 kH/s

Speed.#1.........:  299.2 kH/s (74.07ms) @ Accel:64 Loops:256 Thr:256 Vec:1
Speed.#2.........:  285.2 kH/s (49.55ms) @ Accel:8 Loops:1024 Thr:512 Vec:1
Speed.#3.........:    11286 H/s (79.98ms) @ Accel:32 Loops:1024 Thr:64 Vec:1
Speed.#*.........:  595.7 kH/s


Speed.#5.........:  302.4 kH/s (74.11ms) @ Accel:128 Loops:128 Thr:256 Vec:1
Speed.#6.........:  262.5 kH/s (54.25ms) @ Accel:128 Loops:128 Thr:256 Vec:1
Speed.#7.........:    12131 H/s (79.19ms) @ Accel:8 Loops:1024 Thr:256 Vec:1
Speed.#*.........:  577.0 kH/s

Speed.#5.........:  302.3 kH/s (74.14ms) @ Accel:128 Loops:128 Thr:256 Vec:1
Speed.#6.........:  261.4 kH/s (54.47ms) @ Accel:128 Loops:128 Thr:256 Vec:1
Speed.#7.........:    13573 H/s (74.37ms) @ Accel:64 Loops:128 Thr:256 Vec:1
Speed.#*.........:  577.2 kH/s

Speed.#5.........:  302.1 kH/s (74.17ms) @ Accel:128 Loops:128 Thr:256 Vec:1
Speed.#6.........:  262.1 kH/s (54.33ms) @ Accel:128 Loops:128 Thr:256 Vec:1
Speed.#7.........:    13458 H/s (74.41ms) @ Accel:32 Loops:256 Thr:256 Vec:1
Speed.#*.........:  577.7 kH/s



Speed.#5.........:  302.3 kH/s (74.13ms) @ Accel:128 Loops:128 Thr:256 Vec:1
Speed.#2.........:  288.2 kH/s (49.22ms) @ Accel:8 Loops:1024 Thr:512 Vec:1
Speed.#7.........:    12125 H/s (79.22ms) @ Accel:8 Loops:1024 Thr:256 Vec:1
Speed.#*.........:  602.6 kH/s

Speed.#5.........:  302.1 kH/s (74.18ms) @ Accel:64 Loops:256 Thr:256 Vec:1
Speed.#2.........:  286.6 kH/s (49.50ms) @ Accel:8 Loops:1024 Thr:512 Vec:1
Speed.#7.........:    13458 H/s (74.43ms) @ Accel:32 Loops:256 Thr:256 Vec:1
Speed.#*.........:  602.2 kH/s

Speed.#5.........:  302.3 kH/s (74.13ms) @ Accel:128 Loops:128 Thr:256 Vec:1
Speed.#2.........:  286.9 kH/s (49.48ms) @ Accel:8 Loops:1024 Thr:512 Vec:1
Speed.#7.........:    13570 H/s (74.56ms) @ Accel:128 Loops:64 Thr:256 Vec:1
Speed.#*.........:  602.7 kH/s
Reply


Messages In This Thread
RE: Is using CUDA runtime always prefer over OpenCL ?? - by Gyfer - 06-28-2023, 04:28 PM