290x Stops Cracking and Hangs OclHashcat
#1
I recently switched from Windows to Ubuntu Server and have ran in to a bit of an issue, after a number of hours it seems one of my r9 290x's just stops responding. I'm trying to figure out if this is a card issue or something I'm doing, I don't recall ever having this issue in Windows.

System details: regular tower case, Intel 4770k, MSI z87 motherboard, 3x r9 290x reference, 1x FirePro v4900 (needed vga so I added this I had laying around).

Basic NTLM brute force using -w 3 and --powertune-enable. I had paused it on this particular scan for a while which is why the time started is so long.

Code:
[s]tatus [p]ause [r]esume [b]ypass [q]uit => s

Session.Name...: x
Status.........: Running
Input.Mode.....: Mask (?a?a?a?a?a?a?a?a) [8]
Hash.Target....: File (ntlm)
Hash.Type......: NTLM
Time.Started...: Mon Jun 22 13:16:39 2015 (2 days, 22 hours)
Time.Estimated.: Fri Jun 26 03:00:09 2015 (14 hours, 46 mins)
Speed.GPU.#1...:        0 H/s
Speed.GPU.#2...: 12191.8 MH/s
Speed.GPU.#3...: 12180.8 MH/s
Speed.GPU.#4...:   777.7 MH/s
Speed.GPU.#*...: 25150.4 MH/s
Recovered......: 356/491 (72.51%) Digests, 0/1 (0.00%) Salts
Progress.......: 4911588188422144/6634204312890625 (74.03%)
Rejected.......: 0/4911588188422144 (0.00%)
Restore.Point..: 374246998016/735091890625 (50.91%)
HWMon.GPU.#1...: 100% Util, 79c Temp, 50% Fan
HWMon.GPU.#2...: 100% Util, 84c Temp, 100% Fan
HWMon.GPU.#3...: 100% Util, 81c Temp, 100% Fan
HWMon.GPU.#4...: 99% Util, 80c Temp, 28% Fan

Once it does this and the card hash/s goes to 0, I cannot quit, pressing q just hangs. kill -9 pid doesn't seem to do anything except the process goes defunct, have to reboot to do anything.

Stock clocks, nothing else changed, any suggestions? I may stress test that card individually to see if anything happens with it.
#2
If you're getting an ASIC hang during NTLM brute force, then it is very likely a power issue. How big is your PSU?
#3
Corsair HX1050, definitely starting to push it...

It might just come down to the powertune settings pushing it over the edge, I'll report back with a larger PSU once I get things swapped around. Thanks!
#4
Those 290Xs will draw around 320W on NTLM brute force, so your GPUs alone are likely pulling around 960W. Doesn't leave much power available for everything else in your system. Plus that's a pretty low-end power supply, so I'm not sure how well it handles cards that violate the PCI-e spec (as most high-end AMD GPUs do.) It may not like pushing more amps than its rated for.

I would start by removing the --powertune-enable switch from your command line, and use od6config to underclock the GPUs and de-tune PowerTune.

Code:
od6config --set core=975,power=0

That should help until you can buy a proper PSU for your rig.
#5
The performance is ways to slow for NTLM. I'd say wrong driver / bad driver installation. See here for solution:

https://hashcat.net/wiki/doku.php?id=fre...hould_i_do
#6
I don't think so, I just rebuilt the OS less than a week ago and used the same driver reference in the wiki.

I had it crash twice doing a benchmark, so I think power might be it, just interesting I never this this problem with this setup for a couple weeks in Windows (before this I had 280x cards which were a little less power hungry, but had awful cooling).

I think it's probably because there are still 100+ hashes pending on that run that's slowing it down for hashrate, benchmark seems to be where it should be.
Code:
cat /var/log/Xorg.0.log | grep fglrx
...
[    17.731] (II) fglrx(1): Kernel Module Version Information:
[    17.731] (II) fglrx(1):     Name: fglrx
[    17.731] (II) fglrx(1):     Version: 14.30.4
[    17.731] (II) fglrx(1):     Date: Sep 15 2014
[    17.731] (II) fglrx(1):     Desc: AMD FireGL DRM kernel module
...

$ dmesg | grep fglrx | grep module
[    2.865906] fglrx: module license 'Proprietary. (C) 2002 - ATI Technologies, Starnberg, GERMANY' taints kernel.
[    2.869865] fglrx: module verification failed: signature and/or  required key missing - tainting kernel
[    2.912698] <6>[fglrx] module loaded - fglrx 14.30.4 [Sep 15 2014] with 4 minors

~/atidrivers$ ls
amd-catalyst-14-9-linux-x86-x86-64.zip  amd-driver-installer-14.301.1001-x86.x86_64.run  check.sh  doc

$ ./oclHashcat64.bin -b -m 1000 --powertune-enable
oclHashcat v1.36 starting in benchmark-mode...

Device #1: Hawaii, 3072MB, 1000Mhz, 44MCU
Device #2: Hawaii, 3072MB, 1000Mhz, 44MCU
Device #3: Hawaii, 3072MB, 1000Mhz, 44MCU
Device #4: Turks, 512MB, 650Mhz, 6MCU

Hashtype: NTLM
Workload: 1024 loops, 256 accel

Speed.GPU.#1.: 23223.8 MH/s
Speed.GPU.#2.: 23215.9 MH/s
Speed.GPU.#3.: 23185.1 MH/s
Speed.GPU.#4.:  2298.7 MH/s
Speed.GPU.#*.: 71923.5 MH/s

Started: Fri Jun 26 09:49:40 2015
Stopped: Fri Jun 26 09:49:57 2015
#7
The speeds look fine to me, atom. Both the single hash speeds and the multi-hash speeds. Are you perhaps looking at the speed of the Firepro card?
#8
I was just expecting a -b benchmark. Speeds are fine for multi-hash.
#9
Well, my new PSU came... Just got it in. EVGA 1300W. if I was on the edge before, an extra 250W should be sufficient. This thing supports 110A on the 12v rail.

Once installed this morning every time I kicked off a test it's crashed and hung the system. Either on the "Finding weak hashes" of a regular scan or when kicking off the benchmark.

Narrowed it down to the same GPU #1 I was having issues with before. If I run without it, it works. So I think I'm going to try to RMA card #1... I'm leaning toward card issue over power issue still, but at this point I don't think I can prove it. (Or one could have caused the other I guess).

Code:
:~/oclHashcat-1.36$ ./oclHashcat64.bin -m 1000 -b --gpu-devices 1
oclHashcat v1.36 starting in benchmark-mode...

Device #1: Hawaii, 3072MB, 1000Mhz, 44MCU

Hashtype: NTLM
Workload: 1024 loops, 256 accel

*HANG/REBOOT*

:~/oclHashcat-1.36$  ./oclHashcat64.bin -m 1000 -b --gpu-devices 2
oclHashcat v1.36 starting in benchmark-mode...

Device #2: Hawaii, 3072MB, 1000Mhz, 44MCU

Hashtype: NTLM
Workload: 1024 loops, 256 accel

Speed.GPU.#1.: 21894.5 MH/s

Started: Tue Jun 30 11:55:45 2015
Stopped: Tue Jun 30 11:56:01 2015
:~/oclHashcat-1.36$ ./oclHashcat64.bin -m 1000 -b --gpu-devices 3
oclHashcat v1.36 starting in benchmark-mode...

Device #3: Hawaii, 3072MB, 1000Mhz, 44MCU

Hashtype: NTLM
Workload: 1024 loops, 256 accel

Speed.GPU.#1.: 21939.5 MH/s

Started: Tue Jun 30 11:56:18 2015
Stopped: Tue Jun 30 11:56:34 2015
#10
Well, if you can run -d 2,3 without problems and have trouble -d 1, I think that you have proven what the problem is. You can always try to swap the faulty card with another and see if it still causes problem but chances are that it will unless it was used with a PCI-e extender (that could be faulty too).