Low hashrate with GTX 1650 when using CUDA

So I finally did it, I robbed a bank and I purchased a brand new GPU for use with Hashcat. It's not a monster RTX model, but a modest GTX 1650. It's good enough for me, and it's much better than the GTX 560 Ti I was struggling with for the last two days.

You would think that buying a new GPU with up to date drivers will make your problems go away. Not in my case they didn't. I was still unable to run Hashcat 6.2.5.

hashcat (v6.2.5) starting

Successfully initialized NVIDIA CUDA library.

Failed to initialize NVIDIA RTC library.

* Device #1: CUDA SDK Toolkit not installed or incorrectly installed.
            CUDA SDK Toolkit required for proper device support and utilization.
            Falling back to OpenCL runtime.

* Device #1: WARNING! Kernel exec timeout is not disabled.
            This may cause "CL_OUT_OF_RESOURCES" or related errors.
            To disable the timeout, see: https://hashcat.net/q/timeoutpatch
OpenCL API (OpenCL 3.0 CUDA 11.5.121) - Platform #1 [NVIDIA Corporation]
* Device #1: NVIDIA GeForce GTX 1650, 3520/4095 MB (1023 MB allocatable), 14MCU

OpenCL API (OpenCL 2.0 AMD-APP (1800.11)) - Platform #2 [Advanced Micro Devices, Inc.]
* Device #2: , skipped

Minimum password length supported by kernel: 0
Maximum password length supported by kernel: 256

Hashes: 7 digests; 7 unique digests, 1 unique salts
Bitmaps: 16 bits, 65536 entries, 0x0000ffff mask, 262144 bytes, 5/13 rotates

Optimizers applied:
* Zero-Byte
* Early-Skip
* Not-Salted
* Not-Iterated
* Single-Salt
* Brute-Force
* Raw-Hash

ATTENTION! Pure (unoptimized) backend kernels selected.
Pure kernels can crack longer passwords, but drastically reduce performance.
If you want to switch to optimized kernels, append -O to your commandline.
See the above message to find out about the exact limits.

Watchdog: Temperature abort trigger set to 90c

Initializing backend runtime for device #1. Please be patient...

I think I was being patient enough. I waited for 10 minutes before I aborted.

Before trying to run 6.2.5, I was running 3.0 with flying colors. I was getting up to 6500 MH/s, relying only on Open CL, no CUDA runtime. This is a huge improvement for my limited resources. An MD5 job that took 1 hour, 23 minutes and 29 seconds on GTX 560 Ti was now taking only 12 minutes and 22 seconds on GTX 1650. Compare that with 31 minutes 16 seconds on Radeon HD 6870, and 3 hour, 39 minutes and 15 seconds on Intel UHD 630. As you can see this is a big win for me.

I decided to start with 3.0 because I was having great success with that version using old GPUs. Then I moved on to 4.0 version and my hashrate started to decline, quite significantly. I was getting 470 MH/s at most. With version 5.0 I saw a good improvement, maxing it out at 1835 MH/s.

But version 6.2.5 was still beyond my reach. It doesn't work with GTX 560 Ti at all and it almost works with GTX 1650. For some reason it was failing to load  and 6.2.5 requires CUDA 11. So I downloaded and installed CUDA 11, but on top of CUDA 8. These can be installed side by side, right? I restarted the command shell and restarted Hashcat 6.25 and it worked. But with one noticeable difference, it was running at half the speed I was getting with Open CL. It is currently at work and hashing at about 3700 MH/s.

Maybe I am missing something, but isn't CUDA mode supposed to be faster than Open CL on Nvidia GPUs? That's the whole point of installing it, I presume.

Again, this is an MD5 job and it should be plow through it in mere minutes. I am using brute force mode, with a mask. I am also specifying an output file for convenience. That's it, no other options are enabled, just -a, -m, -o, hash file, and a mask.

* Device #1: NVIDIA GeForce GTX 1650, 3327/4095 MB, 14MCU

OpenCL API (OpenCL 3.0 CUDA 11.5.121) - Platform #1 [NVIDIA Corporation]
* Device #2: NVIDIA GeForce GTX 1650, skipped

OpenCL API (OpenCL 2.0 AMD-APP (1800.11)) - Platform #2 [Advanced Micro Devices, Inc.]
* Device #3: , skipped

Can someone explain to me the significance of these three info segments? Why is CUDA listed twice? How is "OpenCL 3.0 CUDA 11.5.121" CUDA different from "CUDA 11.5" CUDA? And why is the AMD CPU being skipped? Is it because it lacks iGPU? Can it still be put to work?

If Open CL API can be used to utilize CUDA, why do I have to install CUDA SDK toolkit? If Open CL is the fallback runtime like it says in the warning message, why does it not do what it says and fall back to using Open CL API for Open CL runtime rather than stalling when waiting for CUDA runtime to load? Logically, if CUDA runtime is not installed, or the wrong version is installed, the program should respond in some way or time out rather than just sitting there waiting for better times.

Please forgive me for the many questions, I don't mean to bother you, you probably have better things to do than to answer my silly noob questions. But I would greatly appreciate it if you found time to write me a line or two. At least to say hello.

Playfully yours,
I also observe too long initializing of Hashcat v6.2.5
I waited about 16 minutes.
Please see https://hashcat.net/forum/thread-10548.html
how about reading the error messages?

Failed to initialize NVIDIA RTC library.

* Device #1: CUDA SDK Toolkit not installed or incorrectly installed.
            CUDA SDK Toolkit required for proper device support and utilization.
            Falling back to OpenCL runtime.

your nvidia/cuda installation is not working as expected, so hashcat is falling back to opencl, depending on which opencl version is found, yeah this yould take some time

CPU is skipped by default, CPU (AMD/INTEL) need intel opencl runtime from https://www.intel.com/content/www/us/en/...ivers.html to run properly

fix your cuda installation first

cuda is listed twice because it has, lets call it a translation/compat layer from cuda to opencl, when cuda fails, this is what happens, cuda fails and the cuda instrucations are translated to opencl (and this takes time and reduces speed)
Thank you both for stopping by. I'm sorry for this late reply. But yeah, having the proper CUDA SDK Toolkig version is essential for a proper operation of Hashcat. I had a first-hand experience with this not once but twice or even three times by now (I lost count).

In my early posts to the forum I was still inexperienced with Hashcat and I made a lot of assumptions about things. Like, for instance, you see me talk about CUDA and CUDA SDK Toolkit as two separate things, thinking that Open CL alone can leverage the chip level CUDA hardware, unaware that CUDA at the hardware level and CUDA SDK Toolkit are inseparable ant that the latter is mandatory to leverage that CUDA power. It's a proprietary technology NVIDIA holds dare, as far as I know (making assumptions again). As you use and work with a given technology or product, you grow with it, and with that comes experience and knowledge. That's where I am, still. (Aren't we all really just playing and learning?)

Meanwhile, I have purchased an RTX 3050, and then returned it because it was a bad value for my money. Especially after I learned about something called LHR (Low Hashing Rate) that NVIDIA has been imposing on all its more recent GPUs, effectively crippling them for any other use than playing stupid games. What's worse, the manufacturers did a crappy job at being transparent and letting you know about it, whether your particular model is LHR crippled or not. So it seemed like the best thing to do and just return the damn thing. I also had fan noise issues with it, so there's that too.

I purchased two used, cheap, yet still very powerful GTX 1070 and GTX 1070 Ti GPUs without the LHR nonsense. I have been using those since, with much better hashing rate at much lower investment cost and about the same running cost as RTX 3050. I still have the GTX 1650 but I use it only as a video card in one of my other computers.

But yeah, CUDA + CUDA SDK Toolkit = True, and you have to match the version numbers according to Hashcat requirements, and make sure they are installed cleanly.