False temp warnings, but Mhz dips?
#8
(11-06-2022, 04:25 PM)marc1n Wrote: YES

I tried that, and using  --hwmon-disable results in the "Watchdog" text at the beginning to switch from showing a temperature to saying "Temperature abort trigger disabled".
I even tried adding "--hwmon-temp-abort=80" but that did not change anything.

(11-07-2022, 08:42 PM)Chick3nman Wrote:
(11-05-2022, 01:30 PM)cybhashcat Wrote: However, we're getting what appears to be false "Driver temperature threshold met on CPU #X. Expect reduced performance" every few minutes (not always on the same GPU).


This may just be a bit of a misunderstanding. These are not "false" warnings, they are mostly* real. They are just not as serious as they may seem. The warning pops due to a driver reported value, which on many modern GPUs is reported as ~65C. This value is where the GPU starts to lower clock speeds from the maximum boost clock/bin that it has achieved for it's power budget. This clock speed reduction is done in small steps as temperatures increase and is not generally very noticeable until very high temperatures, it's nothing to really worry about. With the way modern GPUs boost, you may still be running at speeds over the rated spec even after this threshold. Usually if you are running at a relatively cool temperature, this happens more often as the GPU temperature bounces around just at the threshold value, making the warning pop repeatedly.

Now, I put a * on "mostly real" because I've seen quite a few users reporting this behavior at temperatures well below/above expected. This behavior is sometimes inconsistent and I've not tracked down _why_ it happens but it generally appears as though the driver has reported a temperature threshold value that doesn't make sense. Hashcat's warning logic just runs off that reported value so if the value we get is junk or in the wrong format or something, it can cause the warning to pop at incorrect temperatures. Unless it's causing you serious annoyance due to the number of warnings you are receiving, this is a purely visual issue and will not affect hashcat in any meaningful way, other than producing warnings at unexpected times.

First of all, Chick3nman, thank you for taking the time to answer.
the very very quick dips to 300~ MHz are like you said, an annoyance, but i was fearing that in the long run it may leave performance on the table (a minute here a minute there), where it really shouldn't be happening if we never go over 70c.

If I understand correctly, this is due to the Nvidia driver reporting to hashcat, is there any way to increase this from 65 to lets say 75 in the driver? or alternatively, achieve what marc1n was talking about, disable these warnings and dips but keep the abort temp at 90.
thanks again.
Reply


Messages In This Thread
RE: False temp warnings, but Mhz dips? - by cybhashcat - 11-09-2022, 11:48 AM