PCI SERR with AMD but not NVIDIA
#11
(07-04-2015, 10:56 PM)epixoip Wrote: Not weird at all... Re-read what I wrote in this thread, it all makes perfect sense Smile

This is actually something we're fighting with right now, as we've learned that the 290X will kill the motherboard in the systems we use after about a year of continuous use. When Tyan and Supermicro say not to use GPUs that draw more than 300W, they rather mean it. So we will no longer be using nor recommending GPUs which violate the PCI-e spec as all high-end AMD GPUs have.

Man, this is horrible Sad I suppose that your assessment of the 290X is valid for 6990 as well. You just told me that I will upgrade for free about 40 systems spread all around the world ... sad. Have you had many such cases, meaning dead motherboards? I was wondering if it is safe to just change the GPU to Titan X for the failed systems, and keep the rest, or if I should change the whole machine. The system which is still running after more than one week with Titan X is one that crashes with AMD within one hour ... seems stable enough. Could save a shitload of money if I can reuse the motherboards Smile
#12
The 6990 draws like 125-150W more than the 290X, so yeah.

Yes, we have had several such cases, which is a big reason why we've moved away from AMD and are no longer going to sell any GPU that has an actual measured power draw of > 290W.

It's probably sufficient to just change the cards if the motherboard works with a replacement card.
#13
(07-06-2015, 12:25 PM)epixoip Wrote: The 6990 draws like 125-150W more than the 290X, so yeah.

Yes, we have had several such cases, which is a big reason why we've moved away from AMD and are no longer going to sell any GPU that has an actual measured power draw of > 290W.

It's probably sufficient to just change the cards if the motherboard works with a replacement card.

Many thanks for your tips. Just one more thing - have you actually identified what component(s) have failed on the motherboard? Not that it matters, but I'm curious.

Cheers,
Costin
#14
The PCI-e slots get burned up from continually pulling more power than they can handle. Slots start dropping off one-by-one.
#15
Really doubt that any card which violate PCI-e standard will draw more than 75W from it (from PCI-e lane). For reference few of HD6990 were running on E35M1-M PRO (2 / motherboard) for ~2 year. On each motherboard one card was on x1.
#16
Then I'm afraid you do not know your history. I have a stack of burned motherboards and power supplies that were once running HD 6990s that says your doubts are seriously misplaced.

There's a reason why EVGA invented the "powerboost" PCIe power adapter mod, which prompted motherboard manufacturers to start adding supplemental power connectors to the PCI-e bus on high-end boards, and also why powered risers suddenly became very popular among bitcoin miners.

This was one of the first threads I read on this issue after experiencing a few meltdowns:

http://forums.evga.com/24pin-ATX-power-p...49507.aspx

So yes, AMD cards absolutely DO draw more than 75W from the PCI-e slots. And this is why Tyan and Supermicro say not to use GPUs that draw > 300W on their boards, because they do not have such supplemental power going to the PCI-e slots. 75+75+150 = 300W. If you use GPUs that draw > 300W, then you'll start burning up the board.

The exception for this is apparently the R9 295X2, which instead draws an absurd amount of power through the 8pin PCI-e power connectors:

http://www.computerbase.de/2014-04/amd-r...rk-test/2/

They measured 245-275W being pulled out of a 150W plug.

I'm honestly not sure which approach is worse, but the outcome is the same: we all need to start moving away from these cards!
#17
Quick update: I managed to get a stable system with a lower PowerTune value (-8, works with 6990 and 290X), with some 5-7% performance loss, always keeping an eye on the total power draw reported by the PSU. Although no MB was evidently burned, once one becomes unstable, it remains unstable and only works with said power setting. Once a GPU stays in the "unstable" configuration for a longer time, it will never become stable again, even with a lower PowerTune value, it goes into the trash bin. Switching all customer units to NVIDIA over the next few months ... sad.

Many thanks for the feedback on this topic.