Temperature control for multiple sessions
#1
I have had some scenarios with hashcat that lead to the creation of this script.  It is bash for use with an Nvidia GPU in Linux.  I hope you guys get some use out of it.

*Edit*

I am troubleshooting an issue that might be exclusive to hashcat 3.30.  I will test with hashcat 3.2 to see if that version also has the same issue.  I am running a single GTX 1080 FE with two different hashcat sessions.  The behavior that should happen is that the remaining session continues to run when the first one stops, but when one session stops on a checkpoint, I get the following error in the other session:

X Error of failed request:  BadValue (integer parameter out of range for operation)
 Major opcode of failed request:  157 (NV-CONTROL)
 Minor opcode of failed request:  3 ()
 Value in failed request:  0x17
 Serial number of failed request:  32
 Current serial number in output stream:  33

Then it crashes.

Perhaps some of you can test a similar scenario and see if you get a similar error.

So no joy on this script just yet.

*Another edit*

I am using hashcat 3.30 and the nvidia-370 driver and the above problem has gone away.  It seems intermittent.  Querying the GPUCoreTemp every 10 seconds may have been too frequent.  It has been adjusted to 15 seconds.  Testing is ongoing.  I will continue testing on or after January 23rd, 2017.

Please report any issues with this script to this thread and I will investigate them.

Until next time.

Code:
#!/bin/bash

#This is for use with an Nvidia GPU in Linux.
#This script will keep the fan running on a single gpu in the event that you have more than one hashcat process running and one hashcat process is terminated.
#This script assumes that you already have the fan speed running to your liking during an existing hashcat session.
#KEEP THIS IN MIND BEFORE YOU RUN IT.  It will not set your fan speeds prior to you running hashcat for the first time.
#This script assumes that GPUFanControlState=1 and GPUPowerMizerMode=1 when executed. Fan control must already be in a manually controllable state.

#In cases where the fan speed has been set manually and --gpu-temp-retain has not been used to maintain the fan speeds, this has utility.

#Some scenarios where this might be useful are:
#1. --gpu-temp-retain has not been used.
#2. There was more than one hashcat session running, but one has terminated for some reason and has taken the fan speed down with it.
#3. You have old sessions (.restore files) that were not using --gpu-temp-retain and want to run more than one of these sessions at a time.  This could be from converting from water cooling to air cooling.  Manual fan control would be necessary as a result.

#This script will keep the GPU below 61 C by resetting the fan speed to 80 percent.
#This script was designed for use with fast hashes using the GTX 1080 FE.  80 percent fan is usually enough to keep the temperature below 61 C.
#This script uses nvidia-settings to get the temperature reading on the GPU (gpu:0).
#There is a 200 Mhz overclock in the keep_fan_on function.  This was put in for the GTX 1080 FE, but can be adjusted or removed as you see fit.
#You can adjust the other values as you see fit.

#An alternate way to get the value for gpu0temp is:
#gpu0temp=$(nvidia-settings -q GPUCoreTemp --ctrl-display=:0 | grep 'Attribute' | grep 'gpu' | awk -F':' '{print $4}' | awk -F'.' '{print $1}' | awk -F ' ' '{print $1}')

#debug on
set -x

#Declare GPU temp integer
declare -i gpu0temp

#Declare functions for fan control
function keep_fan_on {
nvidia-settings -a GPUFanControlState=1 --ctrl-display=:0
nvidia-settings -a GPUTargetFanSpeed=80 --ctrl-display=:0
nvidia-settings -a GPUPowerMizerMode=1 --ctrl-display=:0
nvidia-settings -a GPUGraphicsClockOffset[3]=200 --ctrl-display=:0
}

function turn_fan_off {

#Reset fans when process ends
nvidia-settings -a GPUGraphicsClockOffset[3]=0 --ctrl-display=:0
nvidia-settings -a GPUPowerMizerMode=0 --ctrl-display=:0
nvidia-settings -a GPUFanControlState=0 --ctrl-display=:0
}

while :
do
        #Watch for running hashcat process
        pidcheck=$(ps -e | grep 'hashcat')
        #Get temp from single gpu
        gpu0temp=$(nvidia-settings --query [screen:0]/GPUCoreTemp --ctrl-display=:0 | grep 'Attribute' | awk -F':' '{print $3}' | awk -F'.' '{print $1}' | awk -F ' ' '{print $1}')
        date
        echo "Watching hashcat process(es)..."
        echo "Press Ctrl-C to stop this monitoring script."

        #As long as one hashcat process is detected, then keep the gpu fan running to keep the temperature down
        if [[ $pidcheck != "" ]]
        then
                if (( $gpu0temp > 60 ))
                then
                        #Keep fan at 80 percent if the conditions are met
                        #This may execute more than once at an interval of 15 seconds until the temperature is below 61 C.
                        echo "Temperature has risen above 60 C"
                        echo "Restoring fan speed to 80 percent."
                        #Set GraphicsClockOffset to 0 to avoid excessive overclocking if this executes more than once
                        nvidia-settings -a GPUGraphicsClockOffset[3]=0 --ctrl-display=:0
                        sleep 1
                        keep_fan_on
                fi
        else
                echo "No hashcat process(es) are detected."
                echo "Resetting fans to normal speed."
                turn_fan_off
                #All done, breaking out of while loop
                break
        fi
sleep 15
clear
done
echo "This monitoring script has terminated because hashcat no longer has any processes/sessions running."
echo "Fan speeds are back to normal."
Reply
#2
Do not run two instances with gpu temp control. Use --gpu-temp-disable on every 2nd+ instance
Reply
#3
Here is a dual GPU version of the previous script.

I am still troubleshooting one problem with this though.  The nvidia-settings command that runs to set the fans seems to execute successfully, but has no effect on the fan speed.  If I run the commands outside the script, then the command works as expected and sets the fans (on 2 GPUs).  If I attempt to run the same command within the script, then it fails.  Would anyone know why it doesn't work while running in the script, but the same command in a terminal window works without any problem?

When I find a solution, I will post it here.

*Edit on August 6th, 2017*

The behavior of the problem was that I was trying to set the fan speeds while a hashcat session that controls the fan speeds was already running.  After executing the nvidia-settings commands that change the fan speed to 90 percent, it would result in only a very brief moment when the fan speeds would increase, but then drop back down to their default speeds.  This was detectable by being able to hear the fans increase for a brief moment and the output from nvidia-smi also showed 90 percent for maybe 1 second before returning to its default value.  When hashcat runs without --gpu-temp-disable, it has "control" of the fans even if --gpu-temp-retain is not invoked.  Trying to set the fan speeds while any hashcat session is running that hasn't been invoked with --gpu-temp-disable results in only a very temporary effect on the fan speeds, as previously described.

The problem wasn't that I was executing the nvidia-settings command to set the fan speeds outside of the script.  The problem is that I should have set the fan speeds before I ever resumed the hashcat session that was created without --gpu-temp-disable.

This script is dependent upon at least two separate hashcat sessions running at the same time.  Only one session would be running normally (used without --gpu-temp-disable) and the other session(s) would be running with --gpu-temp-disable used when hashcat was executed.

The problem that I was having has been resolved.  There is nothing wrong with the script or the nvidia-settings commands that it executes, but hashcat has control of the fans while it is running regardless of any nvidia-settings commands executed while hashcat is running.  If the hashcat session that was executed without "gpu-temp-disable" terminates, then the script works as intended.  The hashcat sessions that remain should be sessions that were executed with "--gpu-temp-disable".  After the session that controls the fans ends (the session that was executed without --gpu-temp-disable) then only the hashcat sessions that don't care about fan control should remain.


Note: Executing "nvidia-settings -a GPUFanControlState=0 --ctrl-display=:0" while a hashcat session that isn't using --gpu-temp-disable is running will crash that hashcat session.

Read the instructions.  You should have set the fans manually to your liking before you execute the hashcat session that doesn't use --gpu-temp-disable and before you run this script.  Alternatively, you could use --gpu-temp-retain, but this script was made for running old hashcat sessions that didn't use either --gpu-temp-disable or --gpu-temp-retain.

*Edit on August 6th, 2017*

Here is the dual GPU version of the previous script.

Notes:
I commented out all overclocking and resetting of overclocking.
The temperature thresholds and fan percentages were tweaked.

Adjust the script as you see fit.

Cheers.

Code:
#!/bin/bash

#This is for use with Nvidia GPUs in Linux.
#This script will keep the fan running on multiple gpus in the event that you have more than one hashcat process running and one hashcat process is terminated.
#This script assumes that you already have the fan speed running to your liking during an existing hashcat session.
#If the fan was not set prior to execution of this script, then the fan speeds will be set when at least one GPU reaches 71 C.
#KEEP THIS IN MIND BEFORE YOU RUN IT.  It will not set your fan speeds prior to you running hashcat for the first time.

#This script assumes that GPUFanControlState=1 and GPUPowerMizerMode=1 when executed. Fan control can already be in a manually controllable state with fan speeds set manually, but is not mandatory.
#Alternatively, --gpu-temp-retain can be used with hashcat, but that hashcat session will be in control of the fans until it terminates and this script won't have an effect on the fans until that session ends.

#In cases where the fan speed has been set manually and --gpu-temp-retain has not been used to maintain the fan speeds, this has utility.
#This script is dependent upon at least two separate hashcat sessions running at the same time.  Only one session should be running normally (used without --gpu-temp-disable) and the other session(s) should be running with --gpu-temp-disable used when hashcat was executed.
#This scenario is a prerequisite for this script to function correctly.  Only one session of hashcat should be running without --gpu-temp-disable (i.e. fan control is being done by hashcat).
#If you have an old hashcat session that is not using --gpu-temp-disable and is going to finish to completion soon and you have other hashcat sessions running that were executed with --gpu-temp-disable, then this script can be implemented in order to maintain lower temperatures on your GPUs when the old session terminates and takes the fans to their default speeds.

#Some scenarios where this might be useful are:
#1. --gpu-temp-retain has not been used.
#2. There was more than one hashcat session running, but one has terminated for some reason and has taken the fan speed down with it.
#3. You have old sessions (.restore files) that were not using --gpu-temp-retain and want to run that session along with other sessions invoked with --gpu-temp-disable at the same time.  This could be from converting from water cooling to air cooling.  Manual fan control would be necessary as a result.

#This script will keep the GPU below 71 C by resetting the fan speed to 90 percent.
#This script was designed for use with fast hashes using the GTX 1080 FE.  90 percent fan is usually enough to keep the temperature below 71 C.
#This script uses nvidia-settings to get the temperature reading on the GPU (gpu:0).
#There is a 200 Mhz overclock in the keep_fan_on function.  This was put in for the GTX 1080 FE, but can be adjusted or removed as you see fit.
#You can adjust the other values as you see fit.

#An alternate way to get the value for gpu0temp is:
#gpu0temp=$(nvidia-settings -q GPUCoreTemp --ctrl-display=:0 | grep 'Attribute' | grep 'gpu' | awk -F':' '{print $4}' | awk -F'.' '{print $1}' | awk -F ' ' '{print $1}')

#debug on
set -x

# export DISPLAY to :0
# this should fix the "Failed to connect to Mir: Failed to connect to server socket" error message
export DISPLAY=:0

#Declare GPU temp integer
declare -i gpu0temp
declare -i gpu1temp

#Declare functions for fan control

# control gpu 0
# nvidia-settings -a "GPUFanControlState=1" --ctrl-display=:0
# nvidia-settings -a "GPUTargetFanSpeed=100" --ctrl-display=:0
# nvidia-settings -a "GPUPowerMizerMode=1" --ctrl-display=:0
# nvidia-settings -a "GPUGraphicsClockOffset[3]=200" --ctrl-display=:0
#nvidia-settings -a "[fan:0]/GPUTargetFanSpeed=100" --ctrl-display=:0
#nvidia-settings -a "[gpu:0]/GPUPowerMizerMode=1" --ctrl-display=:0
#nvidia-settings -a "[gpu:0]/GPUGraphicsClockOffset[3]=200" --ctrl-display=:0
function keep_fan_on {
#nvidia-settings -a "GPUFanControlState=1" -a "GPUTargetFanSpeed=90" --ctrl-display=:0
#nvidia-settings -a "GPUFanControlState=1" -a "GPUTargetFanSpeed=90" --ctrl-display=:0
nvidia-settings -a "[gpu:0]/GPUFanControlState=1" -a "[fan:0]/GPUTargetFanSpeed=90" --ctrl-display=:0
nvidia-settings -a "[gpu:1]/GPUFanControlState=1" -a "[fan:1]/GPUTargetFanSpeed=90" --ctrl-display=:0
}

#Reset fans when process ends
function turn_fan_off {
#nvidia-settings -a GPUGraphicsClockOffset[3]=0 --ctrl-display=:0
#nvidia-settings -a GPUPowerMizerMode=0 --ctrl-display=:0
nvidia-settings -a GPUFanControlState=0 --ctrl-display=:0
}

while :
do
#Watch for running hashcat process
pidcheck=$(ps -e | grep 'hashcat')
#Get temp from single gpu
gpu0temp=$(nvidia-settings --query [screen:0]/GPUCoreTemp --ctrl-display=:0 | grep 'Attribute' | awk -F':' '{print $3}' | awk -F'.' '{print $1}' | awk -F ' ' '{print $1}')
gpu1temp=$(nvidia-settings --query [screen:1]/GPUCoreTemp --ctrl-display=:0 | grep 'Attribute' | awk -F':' '{print $3}' | awk -F'.' '{print $1}' | awk -F ' ' '{print $1}')
date
echo "Watching hashcat process(es)..."
echo "Press Ctrl-C to stop this monitoring script."

#As long as one hashcat process is detected, then keep the gpu fan running to keep the temperature down
if [[ $pidcheck != "" ]]
then
if (( $gpu0temp > 70 )) || (( $gpu1temp > 70 ))
then
#Keep fan at 90 percent if the conditions are met
#This may execute more than once at an interval of 15 seconds until the temperature is below 71 C.
echo "Temperature has risen above 70 C"
echo "Restoring fan speed to 90 percent."
#Set GraphicsClockOffset to 0 to avoid excessive overclocking if this executes more than once
                        #This will be commented out if no overclock is being used
#nvidia-settings -a GPUGraphicsClockOffset[3]=0 --ctrl-display=:0
sleep 1
keep_fan_on
fi
else
echo "No hashcat process(es) are detected."
echo "Resetting fans to normal speed."
turn_fan_off
#All done, breaking out of while loop
break
fi
sleep 15
clear
done

echo "This monitoring script has terminated because hashcat no longer has any processes/sessions running."
echo "Fan speeds are back to normal."


Attached Files
.txt   restore_fan_after_hashcat_terminates_multi_gpu.txt (Size: 6.07 KB / Downloads: 2)
Reply