parameter & speed assistance
#1
Hello, 
I have two rigs that are having the exact same speed issue, so I am sure that I am doing something wrong. I've read through other posts on this forum to try to troubleshoot. I apologize if I missed a fix in a different thread. 

Hardware: 
  • Each GPU has at least x8 PCIe lanes. 
  • Each GPU has at least 2 CPU cores. (CPU utilization is at around 10%)
  • RAM is equal to GPU memory + 8GB for system @ 3200
  • SSD hard drive (not NVME)
  • Delta fans running at full tilt for intake & exhaust 
  • rig #1 three 2080Ti (blower)
  • rig #2 three 2080 super (blower)
  • Gigabit networking
  • 1300-watt PSU running on 220v. Each GPU has a dedicated 8-pin connections; no pigtail connections.

Software: 
  • rig #1 Fedora desktop 
  • rig #2 Ubuntu desktop
  • Both are pulling down tasks from a Hashtopolis server. 
  • hashcat v6.2.6
  • Nvidia driver 550.54.14
  • Nvidia cuda 12.4

Attack:
  • -a 3 using pathwell masks (e.g., ?u?l?l?l?l?l?l?l?l?l?d?d, ?u?l?l?l?l?l?s?d?d?d?d,...)
  • extra parameters "-O -w 4 --hwmon-temp-abort 90"
  • usually cracking faster hash types: MD5 or NTLM

Issue: 
Each of my GPUs is running at ~20Gh/s for both MD5 and NTLM. There is no difference in speed between the 2080ti and the 2080 Super. The GPU temps are about 75*C. Using nvidia-smi, I've noticed that each GPU's usage is 99%, and its memory usage is around 3200 MiB. 

Based on the above, what do you think could be wrong? The masks I use are large, and I think that would be enough work for faster hash types. When I do a benchmark on the local machine, I get the same speeds. So, I don't think it is an issue with the Hashtopolis server; maybe I need to increase the chunk size. 


Thank you in advance
Reply
#2
What exactly is the issue? Is 20GH/s not the speed you were expecting?
Reply
#3
(03-24-2024, 12:08 AM)Chick3nman Wrote: What exactly is the issue? Is 20GH/s not the speed you were expecting?

Apologies for not being clear. Yes, precisely. 
Based on other benchmarks that I've seen. I was expecting for the 2080tis to be around 70GH/s for NTLM, not 20GH/s. Also about 50GH/s and not 15 -20GH/s that I've been getting.
Reply
#4
This is likely a combination of factors that may not be so obvious. When hashcat starts up there is a list of optimizers that show what can be applied in the current attack. As you change things about the attack, some of those may become unavailable and the speed differences can be severe. For both MD5 and NTLM there is a major optimizer related to singal target attacks. Are you attacking multiple hashes? That will reduce the apparent speed significantly(sometimes as much as 60%) if so, though this isn't to say that attacking multiple hashes is bad or that the speed you are seeing is bad necessarily.
Reply
#5
(03-24-2024, 05:56 PM)Chick3nman Wrote: This is likely a combination of factors that may not be so obvious. When hashcat starts up there is a list of optimizers that show what can be applied in the current attack. As you change things about the attack, some of those may become unavailable and the speed differences can be severe. For both MD5 and NTLM there is a major optimizer related to singal target attacks. Are you attacking multiple hashes? That will reduce the apparent speed significantly(sometimes as much as 60%) if so, though this isn't to say that attacking multiple hashes is bad or that the speed you are seeing is bad necessarily.

Ah, thank you very much. I just tried doing one single hash, and the speed increased from 20GH/s to 60GH/s per GPU. Closer to the 70GH/s for NTLM that I was expecting. I didn't realize that multiple hashes had that big of an impact on speed. Out of curiosity, is there a hash type that isn't impacted by processing multiple hashes at once?

Could you kindly elaborate on the other factors you mentioned that might influence performance? Furthermore, I am curious about any additional adjustments I might consider for my parameters, such as the potential deactivation of hwmon.
Once again, I appreciate the help.
Reply
#6
This speed difference is specific to a handful of modes/optimized kernels. You've unfortunately picked 2 of the few where it's impactful. What's going on from a technical standpoint is that if there is only 1 hash, we can cheat a bit in doing the math and skip incorrect candidates before we finish processing the full hash by checking specific blocks. This doesn't work for multiple targets because we would need to precompute the early skip stuff for each additional hash and that wouldn't work with how we do our matching currently and likely wouldn't give us the same benefit. The other factors that may play into your speed difference is the mask itself and how much work is available. The mask may be large _overall_ but the minimum divisions may still not be perfectly ideal. The benchmark runs a "real" mask attack under the hood, but with the mask ?b?b?b?b?b or a similar ideal workload usually. Anything less than that, even ?a?a?a?a?a?a?a or similar, will be ever so slightly less ideal and will incur small penalties in speed. This stacks up with other factors, like heat and/or power related clock speed throttling, small differences in the auto tuner from run to run, other stuff going on on the computer at the same time, small hiccups or delays in communication with the cards, differences between the silicon quality on individual GPUs, differences in GPU models and factory clock speed differences, and for some modes even the length of the plaintexts being processed. There's a lot of little factors that can take a few percent off the top and after you stack them all together, you may start to notice it but I wouldn't really worry about it too much. If you're within 10% or so of someone else running the same or a very similar attack, you're doing fine.
Reply