Curious about video driver crash requiring long system off-time
#1
This isn't a problem I can't work around but it made me curious to see if anyone knows the answer.

Got a Windows box (hey I need video games) with a NVIDIA 980Ti SC in it.  When the video driver pukes out and resets, my hashes/second drop by about 2/3rds.  Normal speed is about 277 kH/s, after a video driver reset, it drops to about 80 kH/s.  If I shut down the computer, wait a little while and restart it, I'm still stuck at 80k.  If I shut it down overnight, it tends to come back up at 270k again.  This happens on both Win7 and Win10.  Most recent drivers.

Its almost like the video card gets pissed and complains.  Its not running overly hot (79c) and the fans aren't even running at more than 50% or so.  I'm curious what would happen that would make the video card need such a long rest?  Anyone know?  I can't imagine is still has residual heat after being off for 2hrs, but yet it comes back slow.  After 10 hours it is happy.  Perhaps some breaker gets set or something?

As a random fact, I can't run a benchmark at all without it crashing.  As soon as it gets to the third or fourth test it crashes.

Weird
#2
(11-18-2015, 02:33 PM)hashhampster Wrote: As a random fact, I can't run a benchmark at all without it crashing.  As soon as it gets to the third or fourth test it crashes.

Weird

Not weird. It's not "random" if it happens every time... Sounds like both inadequate PSU and insufficient heat dissipation are the two dragons you'll have to slay. Does your SC GPU have an ACX cooler or slieve design? Either way, all "SC" 980 Ti's that EVGA make are "gaming" cards so not the reference design favoured here.

As to why is needs so long before it works full speed again - I don't know. Probably the PSU is also overheating (which would happen if it's working too hard like I suspect it is) and maybe self-limiting it's output? I don't know much about PSUs...
#3
I said the fact was random, not the behavior, which is completely repeatable, though I don't understand it.  

RE:  The card  - probably not reference built - you are right (EVGA - http://www.amazon.com/gp/product/B00YDAY...ge_o08_s00) but it normally runs very cool.  Behavior exists regardless of clock speed and power draw on the GPU as far as I can tell, I can back it down to sub-reference speeds and voltage and it still does it if the GPU driver resets (this mainly happens if I accidentally run something that hits 3D while an ocl crack is running, like a game, or sometimes random software)  If I leave the box alone its fine for what I am currently doing (WPA).  I can avoid having driver crashes easily by using a 'niced' -w 1 flag.  I didn't buy it for cracking, I bought it for good FPS at 4k, the cracking is just something to fill downtime so I'm not worried if its not perfect, but it does make me curious about what is happening.

RE:  The power supply should be adequate.  At least the overall wattage, though I couldn't say to how much juice a single rail could provide.  I do know that I can run a OCL GPU crack (brute) simultaneous with a regular hashcrack CPU (dictionary) while overlocking the CPU without any problems - both run simultaneously at full speed.  IF the PSU was the problem I would expect that kicking off a high-cpu draw process would cause the reset/lock behavior but it doesn't.

Temperatures aren't too bad overall - well within tolerances even while running, its something else like an interrupt or DMA that is going bad IMO.  The only thing I'm wondering about is what there is that "pops" and needs to chill out for so long before resetting.  Everything can be at room temperature and it will still power back up in slow mode. I have only ever seen the driver reset while running oclHashCrack64 - never at any other times.