fglrx will not load: KCL_AtomicTestBit
#1
Hey guys, I'm having a terrible time trying to get any fglrx driver to install on my Tyan FT72-B7015/S7015 system. Hopefully somebody else has seen this or can pinpoint to what's up. The host OS is Ubuntu 12.04.2 LTS Server and I've tried the 13.1 and 13.3 beta AMD drivers.

Quote:[ 282.731805] BUG: unable to handle kernel NULL pointer dereference at 0000000000000268[ 282.732072] IP: [<ffffffffa1174ba9>] KCL_AtomicTestBit+0x9/0x10 [fglrx]
[ 282.732452] PGD 423a8e067 PUD 4187a9067 PMD 0
[ 282.733029] Oops: 0000 [#1] SMP
[ 282.733361] CPU 0
[ 282.733498] Modules linked in: snd_hda_codec_hdmi fglrx(P) snd_hda_intel snd_hda_codec snd_hwdep snd_pcm psmouse snd_timer serio_raw snd i7core_edac soundcore snd_page_alloc edac_core joydev mac_hid lp parport usbhid hid e1000e
[ 282.736842]
[ 282.736953] Pid: 4665, comm: oclHashcat-lite Tainted: P O 3.2.0-29-generic #46-Ubuntu Tyan FT72-B7015/S7015
[ 282.737533] RIP: 0010:[<ffffffffa1174ba9>] [<ffffffffa1174ba9>] KCL_AtomicTestBit+0x9/0x10 [fglrx]
[ 282.737906] RSP: 0018:ffff880420cc7c98 EFLAGS: 00010202
[ 282.738041] RAX: 0000000000000006 RBX: 0000000000000210 RCX: 0000000000000001
[ 282.738156] RDX: 0000000000000020 RSI: 0000000000000268 RDI: 0000000000000000
[ 282.738271] RBP: ffff880420cc7c98 R08: 0000000000000000 R09: ffffffffa1337fff
[ 282.744789] R10: ffff88041bd40000 R11: ffff88041bd40000 R12: 0000000000000240
[ 282.750925] R13: ffff880420cc7cf8 R14: 0000000000000000 R15: ffff880421993c90
[ 282.757047] FS: 00007f7f9bd63740(0000) GS:ffff88042f200000(0000) knlGS:0000000000000000
[ 282.768714] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 282.774785] CR2: 0000000000000268 CR3: 0000000423ccb000 CR4: 00000000000006f0
[ 282.780891] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 282.787347] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 282.793826] Process oclHashcat-lite (pid: 4665, threadinfo ffff880420cc6000, task ffff880423f68000)
[ 282.806809] Stack:
[ 282.813459] 0000000000000001 ffffffffa119ad59 0000000000000000 ffffffffa11a9262
[ 282.826397] ffff88041bd40000 ffff880421993c90 ffff880421993c90 0000000000000080
[ 282.839298] 0000000000000000 ffffffffa118a306 0000000000000000 ffffffffa11a9262
[ 282.852323] Call Trace:
[ 282.858409] [<ffffffffa119ad59>] mc_heap_get_cmm_zone_info+0x49/0x160 [fglrx]
[ 282.870139] [<ffffffffa11a9262>] ? firegl_trace+0x72/0x1e0 [fglrx]
[ 282.876150] [<ffffffffa118a306>] ? firegl_ci_get_asic_id_ext+0x46/0x320 [fglrx]
[ 282.887730] [<ffffffffa11a9262>] ? firegl_trace+0x72/0x1e0 [fglrx]
[ 282.893510] [<ffffffffa118a0e8>] ? firegl_cwddeci+0xd8/0x170 [fglrx]
[ 282.899111] [<ffffffffa1189fd0>] ? firegl_cwddeci_adl_handler+0x70/0xb0 [fglrx]
[ 282.909435] [<ffffffffa1189b26>] ? Dispatch+0x1d6/0x240 [fglrx]
[ 282.914416] [<ffffffffa11739e5>] ? KCL_CopyFromUserSpace+0x35/0x50 [fglrx]
[ 282.919477] [<ffffffffa118965e>] ? firegl_adl_escape+0xde/0x190 [fglrx]
[ 282.924398] [<ffffffffa1189580>] ? _r6x_init_hw_ctx+0xe0/0xe0 [fglrx]
[ 282.929530] [<ffffffffa1181bed>] ? firegl_ioctl+0x1ed/0x250 [fglrx]
[ 282.934603] [<ffffffffa11719be>] ? ip_firegl_unlocked_ioctl+0xe/0x20 [fglrx]
[ 282.939832] [<ffffffff81189c5a>] ? do_vfs_ioctl+0x8a/0x340
[ 282.945222] [<ffffffff81142793>] ? do_munmap+0x1f3/0x2f0
[ 282.950365] [<ffffffff81189fa1>] ? sys_ioctl+0x91/0xa0
[ 282.955608] [<ffffffff81661ec2>] ? system_call_fastpath+0x16/0x1b
[ 282.960808] Code: c3 90 55 48 89 e5 66 66 66 66 90 3e 0f b3 3e 5d c3 90 55 48 89 e5 66 66 66 66 90 3e 0f bb 3e 5d c3 90 55 48 89 e5 66 66 66 66 90 <0f> a3 3e 19 c0 5d c3 55 48 89 e5 66 66 66 66 90 3e 0f ab 3e 19
[ 282.986869] RIP [<ffffffffa1174ba9>] KCL_AtomicTestBit+0x9/0x10 [fglrx]
[ 282.993325] RSP <ffff880420cc7c98>
[ 282.999758] CR2: 0000000000000268
[ 283.006165] ---[ end trace 909b70b35e54f99c ]---
[ 283.017426] [fglrx:firegl_release] *ERROR* device busy: 1 0
[ 283.024040] [fglrx] release failed with code -EBUSY

Hopefully it's something I've just missed or messed up during the installation. Pretty bare bones / followed the wiki / not much going on here!
#2
of course the glaring null ptr deref there is a pretty big indication of a bug in the driver, but i can't say any of us have experienced that.

have you tried testing each gpu individually, or two at a time, three at a time, etc?
#3
what kernel are you using, by the way? all of my systems are on 3.2.0-39-generic. it could be that a more recent backported patch was introduced that has broken compatibility with 13.1
#4
Yeah, I haven't started swapping out cards yet but I have a suspicion one may be bad. The server is at a remote location so I schedule some time next week to swap things out when I'm on-site.

Kernel is 3.2.0-29-generic #46-Ubuntu SMP and apt-get was 'holding back' linux-headers-server linux-image-server linux-server. I just did an 'apt-get dist-update' to get to 3.2.0-40 but the system didn't come back from a reboot. Will report back later. Wink
#5
haha ok. we've had some growing pains with one of our FT72B7015's as well, and what we ended up doing was pulling it out of the rack and leaving it at the office for a few days until we could work out all of the kinks. i would suggest doing the same, and don't rack it back up until you're 100% sure everything is working correctly and you've done thorough testing.