Three hd7970 = OS hang
#11
Hmm one more idea.
Plz try connect 8+6 power cables from diffrent line. I dont know how good explain it in eng ;/ Idea is to connect each gpu to 2 psu: 8pin from one and 6 pin from second. Mayby psu dont got enought power from one line to work fine with 7970
#12
(08-06-2014, 09:26 AM)Szulik Wrote: Hmm one more idea.
Plz try connect 8+6 power cables from diffrent line. I dont know how good explain it in eng ;/ Idea is to connect each gpu to 2 psu: 8pin from one and 6 pin from second. Mayby psu dont got enought power from one line to work fine with 7970

Tried. didn't help Sad

My current idea is that it is somehow related to heat. As long as it stays < 50c it seems runs stable. It's hard to find out as it goes up to 60c pretty quick
#13
Possible that VRM overheating if waterblock sat wrong, hope you left windows installed, so you could try to run GPU-Z and check, as on linux don't know how to get info from these sensor. Why do I think so? Because I buy some time ago EK waterblocks for HD6990 and saw that gap on some parts were bigger than written on manual. Because of that had to put my own thermal pads. Wrote to EK, but got an answer that they do no mistakes Big Grin
#14
I finally solved the problem. The source was simple: heat

I want to explain the problem a bit, maybe it helps someone else in the future.

My old watercooling curcuit had around 1kw cooling power and around 750ml of water. It cooled down the GPUs pretty good, but not good enough. In idle mode, all GPU were around 40c.

The problems began when I added load to the GPUs. When I did that, the temps raised to around 55c and then the OS began to hang. But 55c, come on, that's nothing. Who would expect this could create problems?

When I put only a single card on load it always worked. The system was stable. Using only a single GPU, the cooling circuit was able to hold it below 50c.

So the Idea was that could it be that if the cards go > 55c that something strange happens?

As I was out of ideas, I really tried everything as you can see in the first post, I took the risk and added some more watercooling components. I've added another radiator, another pump, three more fans and another reservoir. In total the watercooling circuit has now a cooling power of 2kw and the water used increased from 750ml to 2500ml.

When I now run the system, even with crazies settings like -m 900 -n 800 and -u 1024 in -a 3 mode, it is able to cool all 3 GPUs at around 45c. It's running since a day now and it's rock stable. All three GPUs run at ~ 17BH/s, all run at 99% gpu utilization.

So what exactly is the problem? Well, I don't know. All I know it's related to heat. My speculation is the following: There's a bug in GIGABYTE GV-7970C-3GD GPU bios. This card forces me to use their customized GPU bios. I tried to flash it with different 7970 bios including the reference one but whenever I did that the card stayed black on boot. It's possible that this bug somehow creates a false positive alarm signal, for example it adds up all temps but then forgets to divide by the number of GPUs. So if you have two or more card, GPU bios thinks temp is higher than 110c and call some emergency shutdown. Well it's wild speculation.
#15
It happened to me with one of my three watercooled 7990 (all same model).
One of the two cores (always the same) was failing in the same exact way and I solved changing the card.
Maybe I will dismount the water block and plug it again very carefully, maybe there is some critical point where the cooling paste is missing ...
I've two radiator and a pump with 1.6 liter of cooling liquid, after working some hours at 1.100 Mhz they all reach 70-80 degree temperature but works fine.
r.




(08-15-2014, 04:27 PM)atom Wrote: I finally solved the problem. The source was simple: heat

I want to explain the problem a bit, maybe it helps someone else in the future.

My old watercooling curcuit had around 1kw cooling power and around 750ml of water. It cooled down the GPUs pretty good, but not good enough. In idle mode, all GPU were around 40c.

The problems began when I added load to the GPUs. When I did that, the temps raised to around 55c and then the OS began to hang. But 55c, come on, that's nothing. Who would expect this could create problems?

When I put only a single card on load it always worked. The system was stable. Using only a single GPU, the cooling circuit was able to hold it below 50c.

So the Idea was that could it be that if the cards go > 55c that something strange happens?

As I was out of ideas, I really tried everything as you can see in the first post, I took the risk and added some more watercooling components. I've added another radiator, another pump, three more fans and another reservoir. In total the watercooling circuit has now a cooling power of 2kw and the water used increased from 750ml to 2500ml.

When I now run the system, even with crazies settings like -m 900 -n 800 and -u 1024 in -a 3 mode, it is able to cool all 3 GPUs at around 45c. It's running since a day now and it's rock stable. All three GPUs run at ~ 17BH/s, all run at 99% gpu utilization.

So what exactly is the problem? Well, I don't know. All I know it's related to heat. My speculation is the following: There's a bug in GIGABYTE GV-7970C-3GD GPU bios. This card forces me to use their customized GPU bios. I tried to flash it with different 7970 bios including the reference one but whenever I did that the card stayed black on boot. It's possible that this bug somehow creates a false positive alarm signal, for example it adds up all temps but then forgets to divide by the number of GPUs. So if you have two or more card, GPU bios thinks temp is higher than 110c and call some emergency shutdown. Well it's wild speculation.