970 underperforming?
#1
Hi all...

I've recently built a multi-purpose hash rig with a Asus Rampage V motherboard and four NVIDIA GTX 970 cards (https://www.asus.com/nz/Graphics-Cards/T...970OC4GD5/).  I am running the card at stock speed, not overclocked.

The benchmarks I've seen for the GTX 970, including one from epixoip, indicated I should see around 20000MH/s per card for NTLM.  I'm seeing nothing even close to that performance.  On straight brute force, the best I'm seeing is around 8200MH/s per card....this is at 99-100% Util, so the cards have work to do.  Dictionary and rules brings the performance down to between 1000 and 2000MH/s, again at 99-100% Util.  

What are the most likely reasons I would see such a performance differential?  What, if anything, can I do to improve the performance?

Thanks in advance for your advice!

Betawave
#2
1. You're comparing single hash speeds to multi-hash speeds, and dict + rules to brute force, etc. These are not an apples v oranges comparisons.

2. My benchmarks are at +250Mhz. You said you're running at stock clocks.

There's absolutely no reason not to overclock Maxwell cards, they are clocked very low and have tons of overclocking headroom. Should be able to run up to ~1515 Mhz without overvolting. You will also need to do further tuning to set PowerMizer mode, performance level, etc to ensure the most consistent performance.
#3
(01-18-2016, 01:37 AM)epixoip Wrote: 1. You're comparing single hash speeds to multi-hash speeds, and dict + rules to brute force, etc. These are not an apples v oranges comparisons.

2. My benchmarks are at +250Mhz. You said you're running at stock clocks.

There's absolutely no reason not to overclock Maxwell cards, they are clocked very low and have tons of overclocking headroom. Should be able to run up to ~1515 Mhz without overvolting. You will also need to do further tuning to set PowerMizer mode, performance level, etc to ensure the most consistent performance.

Thanks for the reply epixoip.  I really value the input from someone with your experience.

I didn't mean to imply I'm comparing across brute force and dict... I only meant to say that no matter what I'm trying, performance is lower than what I believe the benchmarks would suggest (using the highly scientific method of extrapolating in my head for various approaches!).

I've done the basic tuning with PowerMizer, etc... the next step is OC to see what happens.  I wasn't sure how much I could safely push the cards, so your advice definitely helps.  I just need to figure out how to manually set the fan speed on all of the cards before I go too far down the overclocking road.  I'm using the nvidia-xconfig util to add the Coolbits option to my xorg.conf and then using the nvidia-settings tool to manually adjust the fan.  Unfortunately, I've only been able to get this approach to work with Device0.  Anyway, it's a work in progress!  Thanks for your advice and input.
#4
Well when you say you're comparing apples and oranges, you're saying 8100 MH/s for brute force, but that must be multi-hash not single hash. Multi-hash loses about 50% performance over single hash, so I bet your single hash speeds are around 16200 MH/s which is a lot closer to the 20000 MH/s you're expecting.
#5
(01-18-2016, 03:15 AM)epixoip Wrote: Well when you say you're comparing apples and oranges, you're saying 8100 MH/s for brute force, but that must be multi-hash not single hash. Multi-hash loses about 50% performance over single hash, so I bet your single hash speeds are around 16200 MH/s which is a lot closer to the 20000 MH/s you're expecting.

should be a good idea to introduce in the performance benchmarking the gflop indication? maybe in the form:
card 0 cuda cores clock xxx g-tflop yyyy
card 1 cuda cores clock zzz g-tflop www
#6
No. GFLOPS has literally nothing to do with hash cracking, as hash algorithms use integer math, not floating point math. Just because a device is good at floating point math doesn't necessarily mean it will be good at integer math, and just because a device is good at integer math doesn't necessarily mean it will be good at hash cracking.

What matters is the hash cracking performance, which is precisely what the benchmark command measures.
#7
Update for those interested:

I was able to set fan speed manually on all four cards by attaching a monitor to each one, restarting X (gdm3), saving the new xorg.conf and then restarting X again.  What a pain!  

After spinning up the fans to make sure I wouldn't cook anything, I tested +250MHz on the cards.  Performance was initially good, but it wasn't stable.  I dropped to +200MHz and tested again.  This time the cards were stable and I was able to do additional benchmarking.  Multi-hash is still slower than I expected, but single hash is very close to the other benchmarks I've seen: hovering around 19500MH/s.

Thank you again epixoip for your commentary and advice.  I'm learning a great deal from you and the other experts on the forums and, for that, I'm very grateful.  Thank you.
#8
Wait, you do not need to physically connect a monitor to card to set the fan speeds! You can simply tell the driver a monitor is connected even when one isn't. This is most easily accomplished with the --connected-monitor switch for nvidia-xconfig. There are also some other xconfig settings that are needed that you likely aren't setting, so double-check your xconfig against these commands:

Code:
nvidia-xconfig -s -a --force-generate --allow-empty-initial-configuration \
    --cool-bits=12 --registry-dwords="PerfLevelSrc=0x2222" \
    --no-sli --connected-monitor="DFP-0" -o /etc/X11/xorg.conf

sed -i '/Driver/a     Option         "Interactive" "False"' /etc/X11/xorg.conf