Ubuntu won't boot with more than 4 GPU's
#1
I'm having an issue with a rig I'm building. I have followed the hashcat wiki to build Ubuntu server 14.04.4 and install the ATI 14.301.1001 driver. The driver installs fine and I can get up to 4 cards working with hashcat without issue. Interestingly, amdconfig --adapters=all --odgt returns temperatures for 2 cards but fails on the other two. Other than this oddity I have tested the rig with 4 cards cracking hashes for a few hours without any issues. As soon as I add another card, the system no longer boots. It looks like it hangs while starting X - so I'm thinking it's failing while loading the driver.

Specs:
- 6x7970 (4x MSI twinfrozr 2xreference boards)
- sempron 145
- gigabyte ga 990fx ud3
- 1300w risewell psu
- 750w evga psu
- 4gb ram
- 6x 1 to 16x non powered risers

Any thoughts on what the issue might be? Resources for this seem scarce and I've been beating my head against this for a day now.

Any help is much appreciated.

To add:
- tried mixing the cards around and testing each card individually and they all work fine. So it doesn't appear to be a hardware issue.
#2
hanging while loading X means the driver is stuck in iowait trying to communicate with a gpu that was there, but is now not there. the issue is most likely with your risers. if it's not the riser, then you may have a bad gpu or two.
#3
(10-29-2014, 01:56 AM)epixoip Wrote: hanging while loading X means the driver is stuck in iowait trying to communicate with a gpu that was there, but is now not there. the issue is most likely with your risers. if it's not the riser, then you may have a bad gpu or two.

Thanks for the info. What if the risers have been rotated as well though? I've tried for example booting with first 4 then adding the 5th or booting with last 4 and adding the 2nd card and I still have the same issue.

Could the risers just be creating an issue when that many are being used? Because it doesn't seem that any one of them is particularly bad. I will retest though.
#4
risers are notorious for causing intermittent problems and are almost always the source of any issues you are encountering.
#5
(10-29-2014, 02:05 AM)epixoip Wrote: risers are notorious for causing intermittent problems and are almost always the source of any issues you are encountering.

Sweet. Any opinion on the hash rate store 30cm powered risers? I'm kinda new to the many GPU game so if you can recommend a place that sells decent quality risers I'd appreciate it.

Cheers!
#6
I managed to find some usb risers here locally tonight.

I've swapped out all 6 risers. Still no luck.

I can run any combination of 4 out of the 6 cards through oclHashcat's benchmark without an issue.

As soon as I plug a 5th card in (any arbitrary one) the system wont boot. It stalls on the ubuntu 12.04 booting screen and then it just hangs.

Sad

Any other suggestions?
#7
Why do I get the feeling this has something to do with the PSU(s) ?
#8
that was my next guess, yeah.
#9
Hmmm okay.

I suppose that makes sense, the cards would draw considerably more power with the proper driver installed.

Last night I also went from 32bit to 64bit ubuntu.

Interestingly, now when I connect a 5th or 6th card I get a kernel panic with AMD Vi: IOTBL_INV_TIMEOUT and sync errors - this happens pretty much right after grub boots kernel. I know thats vague, I'm omw to the office and I don't remember the specific error at the moment - I'll post more details later.

I'll try and have only the 1 7970 connected to the 750W PSU as well to see if that changes things. I've had it not connected to the mobo, but I haven't actually tried disconnecting power from the 6th card when I try and boot just the 5th.

Thanks for the continued advice.
#10
Okay.

So with 64bit Ubuntu I get a kernel panic whilst booting even with 2 cards plugged in if one of them is plugged in to the short pcie1x slot.

It seems that the system will never boot if any of the short pcie slots are used.

Additionally I have tried booting 2 of the cards with the 750w power supply and 2 others on my 1300w powersupply and provided I don't use any of the short pcie slots it boots and benchmarks with oclhashcat without an issue.

Also I've tried only connecting one single GPU to the 750w power supply which should power it just fine and if its the 5th card I add the system still won't boot. If I change nothing else but just unplug one of my cards from the mobo (making the total 4 running cards) it boots without issue and benchmarks fine.

I can't see why the pcie short slots seem to be doing this.. I'm honestly running out of ideas short of trying another motherboard.