Three hd7970 = OS hang - Printable Version +- hashcat Forum (https://hashcat.net/forum) +-- Forum: Misc (https://hashcat.net/forum/forum-15.html) +--- Forum: Hardware (https://hashcat.net/forum/forum-13.html) +--- Thread: Three hd7970 = OS hang (/thread-3570.html) Pages:
1
2
|
Three hd7970 = OS hang - atom - 07-27-2014 Hey Guys, recently three of my older cards brick (1 x hd5970, 1 x hd6990 and 1 x hd7970) so bought three hd7970's. The new cards are all GIGABYTE GV-7970C-3GD which are overclocked to 1000mhz by vendor. Problem is that when running them in parallel the OS hangs after 1-2 minutes. But I think the GPUs are ok because if I run them solo with -d 1, -d 2 and -d 3 the OS does not hang. It's only if I run at least two in parallel the OS hangs. It doesn't matter if they run it in a single or in multiple oclHashcat instances. To sort out the problem i tried a lot of different scenarios but now I am out of ideas First I tried them on two different systems that I use for a couple of time with other GPUs: 1st: - Intel I7 4770k - ASUS Z87 Expert - Ubuntu 14.04 lts, 64 bit 2nd: - Intel I7 4770k - ASUS Z87 A - Ubuntu 12.04 lts, 64 bit On both systems the behavior is exactly the same, and since I used these system with other cards my feeling tells me there's no hardware defect on the boards, cpus or rams. More Information about hardware: - The original cooler as been removed and replaced with EK Watercooling blocks. They are connected in serial with a watercooling bridge. The cooling flow works fine - There are no extender cables/risers involved. All cards sit directly on the board - All cards run headless, none of them is connected to a monitor Heat: The GPU's run at ~40c on idle and increase to ~55c under load before the OS hangs. There's no special heat threshold that lead to OS hangs, it happens somewhere > 50c. Power: each GPU has a dedicated 700W power supply Things I tried to change: - Tryed with catalyst 14.4 and 14.6 beta on both systems, always ran amdconfig --initial -f --adapter=all afterwards and rebooted - Updated the Mainboards bios to the latest versions (1803) - Updated the GPUs bios to the latest versions (F72) - Manually switched PCI-E settings in bios to from x16 to x1 - Manually disabled ASPM in bios - Manually disabled all other power-managed related stuff in bios - Underclocked the cards to stock hd7970 settings (925/1375) - Attached original fan to fan-plug on the cards - Switched the GPU positions from 1 to 2, 2 to 3, 3 to 1, etc.. - Disabled iommu on kernel commandline - Blacklisted mei and mei_me modules - Tried only with 2 cards - Tried both, ALU intensive and memory intensive algorithm - Bought a new mainboard (AMD AM3+ with 990fx chipset); to make sure it's not the Z87 - Switched the switch onboard to "2" to use the F70 bios that comes per default - Switched back to "1" and tried to flash with reference hd7970 bios. This action nearly bricked the card as it was causes instant kernel reboots. So I flashed it back to F72 which is the latest version - Installed a fresh Windows 7 (64 bit) and tried on windows - Attached the crossfire bridges - Removed the crossfire bridges - dmesg didn't say anything usefull - X11 log didn't say anything usefull - Replaced the PSU's with other ones One thing to note is that when I disabled X11 (so that ADL can't work and oclHashcat can not read temps etc) it looks like this: Quote:Speed.GPU.#1...: 15891.9 MH/s ... and when I then continiously press "s" it seems #1 continues to work ... But when I have X11 enabled and temps are read, it always looks like this: Quote:[s]tatus [p]ause [r]esume [b]ypass [q]uit => The system is completely frozen at this point. Another interessting thing is by looking at the lspci output the cards run on a different PCI-E speed and ignoring my manual x1 setting from bios: Code: root@et:~# lspci -vv | grep -e "VGA " -e Width This is from ubuntu 14.04. --- Updated with latest tests to have them complete RE: Three hd7970 = OS hang - undeath - 07-27-2014 Have you tried observing dmesg (cat /dev/kmsg)? Sometimes it prints interesting stuff right before a crash (like driver problems). RE: Three hd7970 = OS hang - KT819GM - 07-27-2014 For me it looks like power problems, like cards draw too much power from pci'e and inteligent power management on these motherboards disable them. RE: Three hd7970 = OS hang - atom - 07-29-2014 (07-27-2014, 06:53 PM)undeath Wrote: Have you tried observing dmesg (cat /dev/kmsg)? Sometimes it prints interesting stuff right before a crash (like driver problems). Nothing unusual I'd say.. last entry after crash is: Quote:[ 4.438533] r8169 0000:04:00.0 eth0: link up But I think that means that link up occour 4s after kernel boot, so that's not related... RE: Three hd7970 = OS hang - atom - 07-29-2014 (07-27-2014, 09:38 PM)KT819GM Wrote: For me it looks like power problems, like cards draw too much power from pci'e and inteligent power management on these motherboards disable them. Yeah I think there is some relation. I disabled *everything* in MB bios I could find that was somehow about power management. But it didn't help, after a few minutes it crashed again Nice try anyway! RE: Three hd7970 = OS hang - proinside - 07-29-2014 (07-27-2014, 06:26 PM)atom Wrote: Power: each GPU has a dedicated 700W power supply Did you tried different power supply, other than those 3 ? Did you tried to use one of those 7970 with anything else, other than the other 2 7970 ? RE: Three hd7970 = OS hang - Szulik - 07-30-2014 I can be wrong but i think You should try run it with monitor connected ( in my case pc without monitor working "strange" ) Also belive that all PSU are connected to mobo, they all must/should start working in same time RE: Three hd7970 = OS hang - atom - 08-04-2014 (07-30-2014, 12:18 PM)Szulik Wrote: I can be wrong but i think You should try run it with monitor connected ( in my case pc without monitor working "strange" ) Good Idea. I tried, but did not help RE: Three hd7970 = OS hang - atom - 08-04-2014 (07-29-2014, 11:19 PM)proinside Wrote:(07-27-2014, 06:26 PM)atom Wrote: Power: each GPU has a dedicated 700W power supply I tried with another 1200W and another 1300W, didn't help RE: Three hd7970 = OS hang - atom - 08-04-2014 Just an update, I've added some more tests: - I bought a new mainboard, AMD AM3+ with 990fx bchipset; to make sure it's not the Z87 - Switched the switch onboard to "2" to use the F70 bios that comes per default - Switched back to "1" and tried to flash with reference hd7970 bios. This action nearly bricked the card as it was causes instant kernel reboots. So I flashed it back to F72 which is the latest version - Installed a fresh Windows 7 (64 bit) and tried on windows - Attached the crossfire bridges - Removed the crossfire bridges None of the above helped |