hashcat Forum
Can Hashcat Effectively Use a 4090 on a PCIe x1 2.0 Slot? - Printable Version

+- hashcat Forum (https://hashcat.net/forum)
+-- Forum: Misc (https://hashcat.net/forum/forum-15.html)
+--- Forum: Hardware (https://hashcat.net/forum/forum-13.html)
+--- Thread: Can Hashcat Effectively Use a 4090 on a PCIe x1 2.0 Slot? (/thread-13297.html)



Can Hashcat Effectively Use a 4090 on a PCIe x1 2.0 Slot? - thealexandra - 06-22-2025

Hai!

I know this might sound like a dumb question, but I’ve done a lot of searching and even read through Hashcat's official documentation on PCIe bandwidth. Still, I keep running into conflicting opinions on various forum posts, so I wanted to get some real clarification.

I'm building a system on an older motherboard for specific reasons, and while it has limited PCIe lanes, I’d like to use it for serious GPU compute tasks, including Hashcat, among other things. My main question is whether a high-performance GPU like the RTX 4090 can still be effectively utilized through a PCIe x1 slot for Hashcat workloads. I understand that Hashcat primarily loads data into GPU memory, so in theory, PCIe bandwidth shouldn't be a huge bottleneck, especially for brute-force or compute-heavy work. But I've seen users mention that certain operations (like large wordlists or heavy rule-based attacks) could suffer, while others say the difference is negligible.

To cut to the chase: has anyone actually tested this? I would really love to see benchmark comparisons or real-world examples. Here's a breakdown of the kinds of workloads I'm curious about (assume hash mode = MD5 for consistency):
  • Pure brute-force attack like ?b?b?b?b
  • Giant wordlist with no rules
  • Small wordlist with lots of rules
  • Giant wordlist with lots of rules
I’m not looking to stir debate, I’d just love to know how much of a hit (if any) I should expect in these scenarios when running a high-end GPU on a bottlenecked slot. Actual benchmarks or experience-based insight would be incredibly helpful in planning out my build.

Thanks for reading!


RE: Can Hashcat Effectively Use a 4090 on a PCIe x1 2.0 Slot? - Chick3nman - 06-24-2025

The short answer is: No, you cannot effectively utilize a 4090 in ALL attacks or algorithms on PCI-e x1 2.0.

The longer answer is: Yes, you can effectively utilize a 4090 in SOME attacks or with SOME algorithms at PCI-e x1 2.0. The problem here is that whether or not you can utilize the card fully in hashcat is more of a question of how your attack is structured and how the work is created/processed, and which algorithms are being worked on. You could easily come up with a use case, attack setup, algorithm choice, etc. that sees 0 impact from PCI-e bandwidth, which makes giving a definitive yes or no answer difficult. There will also be attacks and algorithms where the PCI-e bottleneck will eat potentially significant performance, especially on fast hashes with no amplifiers, but it will still work and some attacks even on fast algorithms will be less impacted than others or potentially not impacted at all. This is where a lot of the confusion can come from.

I have actually tested this for hashcat, and retested it for this post, and in my own testing it appears that the "breakpoint" where PCI-e bandwidth stops being the primary limitation on speed is actually lower than many might assume. At 2.0 x4 the bandwidth limitation will only sometimes cause issues and the performance hit may not be significant in most cases. At x8 it should not usually be noticeable. At 2.0 x16, you should be able to happily push enough work to the card to not notice the limitation from the bandwidth, as other things will become the limiting factor. My previous advice has always been to not drop below 3.0 x4 to avoid seeing noticeable speed loss from the bandwidth itself, and that's also mirrored in my testing at 2.0 x8 since they are approximately the same effective bandwidth. Based on that, it seems like the "recommend bandwidth" sits around 32Gbit/s, though I find that 64Gbit/s and higher "feels" more consistent in testing. There can be other issues with using the older generation bus though, as some cards seem to behave a little less happily at 2.0 vs 3.0 vs 4.0, but I don't know what the cause or reason for this might be. My 4090 was just not as stable feeling while running at 2.0 x16 and it felt like the performance variance(speed +- during the attack) in the bandwidth limited attack setups was higher than it was at 4.0 x16.

>Pure brute-force attack like ?b?b?b?b
These will be limited by workload and mask setup, bandwidth SHOULDN'T impact these, but there's some rare cases it might.

>Giant wordlist with no rules
Bandwidth will heavily impact this, below the "breakpoint" but above that you shouldn't see much, if any major difference.

>Small wordlist with lots of rules
If the wordlist is small enough, you will run into workload issues here, but not bandwidth issues. You may also run into performance issues with the rules you select, and it gets very hard to "Estimate" speeds as the behavior can be somewhat inconsistent. You're better off measuring "GPU util" to make sure you're seeing full card utilization than "speed".

>Giant wordlist with lots of rules
Same thing as the above, but most likely you have enough basewords for the base workload to be ok and will be mostly impacted by the various rules and such again. Wordlists above a certain size can also bog down other parts of the process, such as dict caching, initialization and initial read, and can sometimes cause performance issues across the system for various reasons, so best not to go "too big" with any one file purely from a performance standpoint. There are plenty of other reasons not to go too big on wordlists, but they are out of scope for this question.

To address some other stuff and elaborate a bit:

>Hashcat primarily loads data into GPU memory
This is a bit complicated and I think I'd lean on the side of calling this not really the case, in that we don't cache to GPU memory for wordlist data or preload data like you might think, though it does get used plenty despite that.

>so in theory, PCIe bandwidth shouldn't be a huge bottleneck, especially for brute-force or compute-heavy work
Yes, in theory workloads that exist purely on the GPU and don't consume data from the host will see little to no downsides from lacking host bandwidth(ignoring issues with signaling or returns). They may still suffer from other speed problems, like poor workload, but that's besides the point. This is why you see mining rigs built around x1 risers, they don't do almost any host communication while processing. Hashcat, and other heavy GPU workloads are NOT like mining though, and we don't suggest mining rig style builds for this and many other reasons(risers/limited cpu & ram/cut corners galore). We often need more host resources (CPU/RAM/Bandwidth/etc.) to operate effectively, and there are plenty of other tasks that are even heavier on this than hashcat so YMMV for anything else you end up doing on the GPUs in this system. If you're building for HPC, 2.0 x1 is not the move, especially if its via a riser.


RE: Can Hashcat Effectively Use a 4090 on a PCIe x1 2.0 Slot? - thealexandra - 06-26-2025

Thank you for getting back to me so quickly!

I had a follow-up question regarding benchmarks. I came across a site claiming that the 2080 Ti outperforms the 4090 when cracking SHA1, and even surpasses the 5090 in MD5 performance. That doesn’t sound right to me, please correct me if I’m mistaken:
https://openbenchmarking.org/test/pts/hashcat&eval=8b64c180eac0ce4c97f0d73d774dbb6161bedb5f#metrics

Is there any scenario where certain hash algorithms perform better on older or specific GPUs? Or are the 4090 and 5090 truly unmatched across the board? What about hash rate per watt?


RE: Can Hashcat Effectively Use a 4090 on a PCIe x1 2.0 Slot? - Chick3nman - 06-28-2025

I have no idea why they would think that. The 5090 is easily the fastest GPU, and the 4090 is not awfully far behind it. The 2080Ti certainly cannot come close to either, so something is seemingly wrong with their benchmarks.

Hashrate per Watt can be a bit more difficult to measure and estimate, however the 4090 has been king of hashrate per watt _as well_, with the 5090 being similar though I've got less time and testing on the 5090 to say for certain. Everyone tends to look at "peak" power, especially using the rated TDP figures, but this doesn't tell the full story. You can shave a significant portion of the power off of a 4090 and lose little to no speed.

I've just run a benchmark at the unlocked rated power limit(133%) of ~600W and I come up with ~160GH/s for MD5. Cutting the power limit to 75%, or ~350W, I am still seeing 145GH/s. This puts the 4090, and very likely the 5090, in a position of significant performance per watt, even against lower and commonly more efficient GPUs. This won't hold perfectly true, as some algorithms are more power hungry than others and may see greater variance, but in most common algorithms we see very little fall off even with significant power restrictions. The high end cards have become increasingly difficult to beat in any metric, with complete dominance in terms of total performance and performance density, and even great performance per watt when power limited. Performance per dollar can be really variable as it depends on what price you end up paying for such a card and the prices have been pretty unstable, but at least at MSRP they even look like a pretty good value all things considered.


RE: Can Hashcat Effectively Use a 4090 on a PCIe x1 2.0 Slot? - thealexandra - 06-28-2025

Awesome, thanks again! I’m also curious about the other supported OpenCL device types:

- [ OpenCL Device Types ] -
  # | Device Type
===+=============
  1 | CPU
  2 | GPU
  3 | FPGA, DSP, Co-Processor

Are these just listed for completeness, or do people actually use them? I haven’t seen much public cracking done with FPGAs, aside from some custom setups using low-end hardware. If any of these alternatives are actually viable (Especially the Co-Processor), how do they compare performance-wise to top-tier GPUs like the 4090 or 5090?
I realize this might sound a bit redundant, but given how beefy my rig is shaping up to be, I could see myself putting any capable device to work.