Can Hashcat Effectively Use a 4090 on a PCIe x1 2.0 Slot?
#2
The short answer is: No, you cannot effectively utilize a 4090 in ALL attacks or algorithms on PCI-e x1 2.0.

The longer answer is: Yes, you can effectively utilize a 4090 in SOME attacks or with SOME algorithms at PCI-e x1 2.0. The problem here is that whether or not you can utilize the card fully in hashcat is more of a question of how your attack is structured and how the work is created/processed, and which algorithms are being worked on. You could easily come up with a use case, attack setup, algorithm choice, etc. that sees 0 impact from PCI-e bandwidth, which makes giving a definitive yes or no answer difficult. There will also be attacks and algorithms where the PCI-e bottleneck will eat potentially significant performance, especially on fast hashes with no amplifiers, but it will still work and some attacks even on fast algorithms will be less impacted than others or potentially not impacted at all. This is where a lot of the confusion can come from.

I have actually tested this for hashcat, and retested it for this post, and in my own testing it appears that the "breakpoint" where PCI-e bandwidth stops being the primary limitation on speed is actually lower than many might assume. At 2.0 x4 the bandwidth limitation will only sometimes cause issues and the performance hit may not be significant in most cases. At x8 it should not usually be noticeable. At 2.0 x16, you should be able to happily push enough work to the card to not notice the limitation from the bandwidth, as other things will become the limiting factor. My previous advice has always been to not drop below 3.0 x4 to avoid seeing noticeable speed loss from the bandwidth itself, and that's also mirrored in my testing at 2.0 x8 since they are approximately the same effective bandwidth. Based on that, it seems like the "recommend bandwidth" sits around 32Gbit/s, though I find that 64Gbit/s and higher "feels" more consistent in testing. There can be other issues with using the older generation bus though, as some cards seem to behave a little less happily at 2.0 vs 3.0 vs 4.0, but I don't know what the cause or reason for this might be. My 4090 was just not as stable feeling while running at 2.0 x16 and it felt like the performance variance(speed +- during the attack) in the bandwidth limited attack setups was higher than it was at 4.0 x16.

>Pure brute-force attack like ?b?b?b?b
These will be limited by workload and mask setup, bandwidth SHOULDN'T impact these, but there's some rare cases it might.

>Giant wordlist with no rules
Bandwidth will heavily impact this, below the "breakpoint" but above that you shouldn't see much, if any major difference.

>Small wordlist with lots of rules
If the wordlist is small enough, you will run into workload issues here, but not bandwidth issues. You may also run into performance issues with the rules you select, and it gets very hard to "Estimate" speeds as the behavior can be somewhat inconsistent. You're better off measuring "GPU util" to make sure you're seeing full card utilization than "speed".

>Giant wordlist with lots of rules
Same thing as the above, but most likely you have enough basewords for the base workload to be ok and will be mostly impacted by the various rules and such again. Wordlists above a certain size can also bog down other parts of the process, such as dict caching, initialization and initial read, and can sometimes cause performance issues across the system for various reasons, so best not to go "too big" with any one file purely from a performance standpoint. There are plenty of other reasons not to go too big on wordlists, but they are out of scope for this question.

To address some other stuff and elaborate a bit:

>Hashcat primarily loads data into GPU memory
This is a bit complicated and I think I'd lean on the side of calling this not really the case, in that we don't cache to GPU memory for wordlist data or preload data like you might think, though it does get used plenty despite that.

>so in theory, PCIe bandwidth shouldn't be a huge bottleneck, especially for brute-force or compute-heavy work
Yes, in theory workloads that exist purely on the GPU and don't consume data from the host will see little to no downsides from lacking host bandwidth(ignoring issues with signaling or returns). They may still suffer from other speed problems, like poor workload, but that's besides the point. This is why you see mining rigs built around x1 risers, they don't do almost any host communication while processing. Hashcat, and other heavy GPU workloads are NOT like mining though, and we don't suggest mining rig style builds for this and many other reasons(risers/limited cpu & ram/cut corners galore). We often need more host resources (CPU/RAM/Bandwidth/etc.) to operate effectively, and there are plenty of other tasks that are even heavier on this than hashcat so YMMV for anything else you end up doing on the GPUs in this system. If you're building for HPC, 2.0 x1 is not the move, especially if its via a riser.
Reply


Messages In This Thread
RE: Can Hashcat Effectively Use a 4090 on a PCIe x1 2.0 Slot? - by Chick3nman - 06-24-2025, 09:32 PM