06-28-2015, 09:33 AM
(06-27-2015, 11:51 PM)epixoip Wrote: That's a sufficiently large PSU, but I'm still kind of leaning towards power. Here's why:
With two cards that violate the PCI-e spec, you receive errors. With a card that adheres to the PCI-e spec, you get no errors. So I think the motherboard is intelligent enough to know that > 75W are being pulled through the PCI-e slot, and disables the slot accordingly. Disabling the slot means killing communications between the driver and the hardware, simulating an ASIC hang, leaving the driver stuck in IOWAIT and the kernel in a weird state.
Well, it turns out that the NVIDIA was just more stable. The cudaHashcat process hung sometime during the past 12 hours. No more PCI SERR, no kernel errors, but also pretty stuck, see below, speed = 0, temp = low, no progress:
Session.Name...: cudaHashcat
Status.........: Running
Input.Mode.....: Mask (?1?2?2?2?2?2?2?3?3?3) [10]
Hash.Target....: 0123456789abcdeffedcba9876543210
Hash.Type......: NTLM
Time.Started...: Thu Jun 25 17:46:00 2015 (2 days, 13 hours)
Time.Estimated.: Sun Jun 28 22:34:50 2015 (13 hours, 12 mins)
Speed.GPU.#1...: 0 H/s
Recovered......: 0/1 (0.00%) Digests, 0/1 (0.00%) Salts
Progress.......: 7662076434579456/9301612953526272 (82.37%)
Rejected.......: 0/7662076434579456 (0.00%)
Restore.Point..: 3432829943808/4167389316096 (82.37%)
HWMon.GPU.#1...: 100% Util, 36c Temp, 100% Fan
Quitting does not really work, but reboot does.
PS. Nvidia settings, I hope they are in acceptable range:
Attribute 'GPUPowerMizerMode' (ep77:0[gpu:0]) assigned value 1.
Attribute 'GPUFanControlState' (ep77:0[gpu:0]) assigned value 1.
Attribute 'GPUGraphicsClockOffset' (ep77:0[gpu:0]) assigned value 225.
Attribute 'GPUTargetFanSpeed' (ep77:0[fan:0]) assigned value 100.