Build from source 100x slower than prod
#1
Hello! I am trying to add a new kernel module to hashcat. I have finished development of the module and the algorithm works, but I am seeing some serious performance issues when I build from source, even with the normal benchmarks not related to my module.

When running the source built hashcat in benchmark mode with this command:
./hashcat -m 8900 -b -w4
I get about 1000 H/s. However, when I run it on the production executable, I get 1100 kH/s on the same device.

Is there a build flag or something I am missing to optimize performance on the build? I took a look at the Travis config but it looks like it is also just calling make.

Any help would be appreciated!
Reply
#2
there were some major improvements to scrypt in latest developerment version. make sure to clone latest github master and add your scrypt based algorithm to hashcat.hctune and read the text about scrypt inside that file.
Reply
#3
I just rebased onto master, re-built, and still getting about 1100 H/s with scrypt vs. 1100 kH/s from the executable from the website (with no changes to the scrypt implementation, just using that as a benchmark for my own algorithm). So marginally faster but still, build is very very slow compared to the production version, with zero changes to the algorithm.
Reply
#4
What scrypt settings are you running and what hardware are you running it on?
Reply
#5
No custom scrypt settings, so just pulling straight from master, building, running in benchmark mode with -m 8900, and the comparing to the executable from the site with the same args. Tried on a 2080ti Super.
Reply
#6
Try not setting -w 4 and see what happens.
Reply
#7
Unfortunately same result - is there no difference in the executable build when it is built for deployment?
Reply
#8
Can you post the output from both versions? Nothing that has changed in the git version should have caused such a significant slowdown for you. If you want a version built from the latest github you can grab binaries from hashcat.net/beta/ and test with those instead of building yourself.
Reply
#9
Apologies on the delay here, but here are the two outputs:

First, from the website:
Code:
hashcat (v6.1.1) starting in benchmark mode...

Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.

Kernel /usr/local/share/hashcat/OpenCL/m08900-optimized.cl:
Optimized kernel requested but not needed - falling back to pure kernel

CUDA API (CUDA 11.2)
====================
* Device #1: GeForce RTX 2080 Ti, 10862/11019 MB, 68MCU

OpenCL API (OpenCL 1.2 CUDA 11.2.109) - Platform #1 [NVIDIA Corporation]
========================================================================
* Device #2: GeForce RTX 2080 Ti, skipped

Benchmark relevant options:
===========================
* --optimized-kernel-enable

Hashmode: 8900 - scrypt (Iterations: 1)

Speed.#1.........:  709.5 kH/s (24.24ms) @ Accel:16 Loops:1 Thr:16 Vec:1

Started: Sun Apr 25 21:56:35 2021
Stopped: Sun Apr 25 21:56:42 2021

Second, from building from source:
Code:
hashcat (v6.1.1-248-g62fc3601b) starting in benchmark mode...

Benchmarking uses hand-optimized kernel code by default.
You can use it in your cracking session by setting the -O option.
Note: Using optimized kernel code limits the maximum supported password length.
To disable the optimized kernel code in benchmark mode, use the -w option.

CUDA API (CUDA 11.2)
====================
* Device #1: GeForce RTX 2080 Ti, 10862/11019 MB, 68MCU

OpenCL API (OpenCL 1.2 CUDA 11.2.109) - Platform #1 [NVIDIA Corporation]
========================================================================
* Device #2: GeForce RTX 2080 Ti, skipped

Benchmark relevant options:
===========================
* --optimized-kernel-enable

Hashmode: 8900 - scrypt (Iterations: 1)

Speed.#1.........:    1344 H/s (1618.40ms) @ Accel:68 Loops:1 Thr:32 Vec:1

Started: Sun Apr 25 21:57:38 2021
Stopped: Sun Apr 25 21:57:49 2021

Both were run with "hashcat -b -m 8900"
Reply
#10
It appears that the speed difference may be caused by the N value being something like 1 when run on the production version, even though the source version benchmark N value is 16384. Still digging in here, but do you know if the test values for the benchmark in the production version come from somewhere else?
Reply