Posts: 6
Threads: 1
Joined: Apr 2021
Hello! I am trying to add a new kernel module to hashcat. I have finished development of the module and the algorithm works, but I am seeing some serious performance issues when I build from source, even with the normal benchmarks not related to my module.
When running the source built hashcat in benchmark mode with this command:
./hashcat -m 8900 -b -w4
I get about 1000 H/s. However, when I run it on the production executable, I get 1100 kH/s on the same device.
Is there a build flag or something I am missing to optimize performance on the build? I took a look at the Travis config but it looks like it is also just calling make.
Any help would be appreciated!
Posts: 5,185
Threads: 230
Joined: Apr 2010
there were some major improvements to scrypt in latest developerment version. make sure to clone latest github master and add your scrypt based algorithm to hashcat.hctune and read the text about scrypt inside that file.
Posts: 6
Threads: 1
Joined: Apr 2021
04-20-2021, 09:10 PM
(This post was last modified: 04-20-2021, 09:13 PM by thaas.)
I just rebased onto master, re-built, and still getting about 1100 H/s with scrypt vs. 1100 kH/s from the executable from the website (with no changes to the scrypt implementation, just using that as a benchmark for my own algorithm). So marginally faster but still, build is very very slow compared to the production version, with zero changes to the algorithm.
Posts: 414
Threads: 2
Joined: Dec 2015
What scrypt settings are you running and what hardware are you running it on?
Posts: 6
Threads: 1
Joined: Apr 2021
No custom scrypt settings, so just pulling straight from master, building, running in benchmark mode with -m 8900, and the comparing to the executable from the site with the same args. Tried on a 2080ti Super.
Posts: 414
Threads: 2
Joined: Dec 2015
Try not setting -w 4 and see what happens.
Posts: 6
Threads: 1
Joined: Apr 2021
Unfortunately same result - is there no difference in the executable build when it is built for deployment?
Posts: 414
Threads: 2
Joined: Dec 2015
Can you post the output from both versions? Nothing that has changed in the git version should have caused such a significant slowdown for you. If you want a version built from the latest github you can grab binaries from hashcat.net/beta/ and test with those instead of building yourself.
Posts: 6
Threads: 1
Joined: Apr 2021
04-25-2021, 11:58 PM
(This post was last modified: 04-25-2021, 11:59 PM by thaas.)
Apologies on the delay here, but here are the two outputs:
First, from the website:
Code:
hashcat (v6.1.1) starting in benchmark mode...
Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.
Kernel /usr/local/share/hashcat/OpenCL/m08900-optimized.cl:
Optimized kernel requested but not needed - falling back to pure kernel
CUDA API (CUDA 11.2)
====================
* Device #1: GeForce RTX 2080 Ti, 10862/11019 MB, 68MCU
OpenCL API (OpenCL 1.2 CUDA 11.2.109) - Platform #1 [NVIDIA Corporation]
========================================================================
* Device #2: GeForce RTX 2080 Ti, skipped
Benchmark relevant options:
===========================
* --optimized-kernel-enable
Hashmode: 8900 - scrypt (Iterations: 1)
Speed.#1.........: 709.5 kH/s (24.24ms) @ Accel:16 Loops:1 Thr:16 Vec:1
Started: Sun Apr 25 21:56:35 2021
Stopped: Sun Apr 25 21:56:42 2021
Second, from building from source:
Code:
hashcat (v6.1.1-248-g62fc3601b) starting in benchmark mode...
Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.
CUDA API (CUDA 11.2)
====================
* Device #1: GeForce RTX 2080 Ti, 10862/11019 MB, 68MCU
OpenCL API (OpenCL 1.2 CUDA 11.2.109) - Platform #1 [NVIDIA Corporation]
========================================================================
* Device #2: GeForce RTX 2080 Ti, skipped
Benchmark relevant options:
===========================
* --optimized-kernel-enable
Hashmode: 8900 - scrypt (Iterations: 1)
Speed.#1.........: 1344 H/s (1618.40ms) @ Accel:68 Loops:1 Thr:32 Vec:1
Started: Sun Apr 25 21:57:38 2021
Stopped: Sun Apr 25 21:57:49 2021
Both were run with "hashcat -b -m 8900"
Posts: 6
Threads: 1
Joined: Apr 2021
It appears that the speed difference may be caused by the N value being something like 1 when run on the production version, even though the source version benchmark N value is 16384. Still digging in here, but do you know if the test values for the benchmark in the production version come from somewhere else?