06-27-2015, 04:39 PM
(06-26-2015, 10:24 PM)epixoip Wrote: Sounds power-related to me. How big are the power supplies in these systems, and how many GPUs per system?
Most of the boards produced by Supermicro and Tyan have a clause buried deep in the documentation that states not to use cards that draw more than 300W of power. The 6990 is an unholy power hog (well over 400W), and the 290X is still > 300W (around 325W for single hash NTLM brute force.) The Titan X, on the other hand, only draws 235-250W.
The PSU id 1400W, one GPU per system (well, one "card", as the 6990 can be seen as two GPUs). Anyway, I did replace the PSU, with the same results. Also monitored the temperature of GPU, MB, RAM ... nothing unusual, just the sudden PCI error, IPMI logs that crap, Linux kernel hangs.