Review - AMD Radeon R9 290X
#1
Sad 
UPDATE Dec 21 2013

I take back everything negative I said about this card. Catalyst 13.12 and od6config resolve all issues I identify below. Performance increase over the HD 7970 ranges from 23%-60%, with 64-bit hashes seeing the least benefit, and DES-based hashes seeing the biggest gains. Overall the 290X is 36.7% faster than the HD 7970 on average, which is greater than the projected 35% performance increase.



With every generation, AMD finds some new and exciting way to piss us off. They've really outdone themselves this time with the release of the R9 290X. I was so excited to try this new card out. It looked so promising. With a 35% increase in stream processors over the HD 7970 and 5.9 TFLOPS of raw compute power, on paper it looked like a single GPU card that was faster than the dual-GPU HD 6990. So you can imagine how stoked I was when I finally got my paws on eight of these shiny new bastards.

[Image: box.jpg]



The excitement increased as I pulled one of the boxes out, and was greeted by this badass alien robot demon cyborg guy:

[Image: sapphire_290X_box.jpg]



I really dig the new shroud design as well. Not at all professional, but it appeals to my inner child:

[Image: 290x.jpg]



I was too excited to build a new system just for these cards. I wanted to test them RANNOW. So I cannibalized a system that I had already built out with four HD 7990s, and stuffed it full of 290Xs.

[Image: 4x7990.jpg]

[Image: 8x290X_front.jpg]

[Image: 8x290X.jpg]

[Image: 8x290X_pluggedin.jpg]

[Image: 8x290X_rear.jpg]



Damn, these cards look fantastic stacked on top of each other!

All right, let's power up the system and check them out.

Code:
opencl@sagitta:~$ lspci | grep VGA
04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970]
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970]
08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970]
09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970]
85:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970]
86:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970]
89:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970]
8a:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon HD 8970]

Hah, awesome. The device identifies itself as an HD 8970. That will likely cause some confusion with the other Tahiti-based HD 8970 that is out there. The driver gets it right though.

Code:
opencl@sagitta:~$ amdconfig --lsa
* 0. 04:00.0 AMD Radeon R9 290 Series
  1. 05:00.0 AMD Radeon R9 290 Series
  2. 08:00.0 AMD Radeon R9 290 Series
  3. 09:00.0 AMD Radeon R9 290 Series
  4. 85:00.0 AMD Radeon R9 290 Series
  5. 86:00.0 AMD Radeon R9 290 Series
  6. 89:00.0 AMD Radeon R9 290 Series
  7. 8a:00.0 AMD Radeon R9 290 Series

And here's the truncated clinfo output as well while we're at it:
Code:
opencl@sagitta:~$ clinfo
  Device Type:                                   CL_DEVICE_TYPE_GPU
  Device ID:                                     4098
  Board name:                                    AMD Radeon R9 290 Series
  Device Topology:                               PCI[ B#-118, D#0, F#0 ]
  Max compute units:                             44
  Max work items dimensions:                     3
    Max work items[0]:                           256
    Max work items[1]:                           256
    Max work items[2]:                           256
  Max work group size:                           256
  Preferred vector width char:                   4
  Preferred vector width short:                  2
  Preferred vector width int:                    1
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      4
  Native vector width short:                     2
  Native vector width int:                       1
  Native vector width long:                      1
  Native vector width float:                     1
  Native vector width double:                    1
  Max clock frequency:                           1000Mhz
  Address bits:                                  32
  Max memory allocation:                         1073741824
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    16
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              2048
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     No
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    16384
  Global memory size:                            3221225472
  Constant buffer size:                          65536
  Max number of constant args:                   8
  Local memory type:                             Scratchpad
  Local memory size:                             32768
  Kernel Preferred work group size multiple:     64
  Error correction support:                      0
  Unified memory for Host and Device:            0
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue properties:
    Out-of-Order:                                No
    Profiling :                                  Yes
  Platform ID:                                   0x00007f64ee7b04c0
  Name:                                          Hawaii
  Vendor:                                        Advanced Micro Devices, Inc.
  Device OpenCL C version:                       OpenCL C 1.2
  Driver version:                                1348.4 (VM)
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 1.2 AMD-APP (1348.4)
  Extensions:                                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer

X failed to start of course, because this system has Catalyst 13.9 on it, and 13.9 doesn't support the R9 line. Not a problem, upgrade real quick to 13.11beta6 and reboot.

Great, X came up this time. Now we're ready to launch Hashcat and start having fun! Erm, wait... Something is missing... I didn't hear the fans spin up. We set the fan speed in Xsetup so that the fans spin up when X starts. But there is no fan noise at all, and these new GPUs are supposed to be insanely loud... let's investigate.

Code:
opencl@sagitta:~$ DISPLAY=:0.0 amdconfig --pplib-cmd "get fanspeed 0"
PPLIB command execution has failed!
ati_pplib_cmd: execute "get" failed!
opencl@sagitta:~$ DISPLAY=:0.1 amdconfig --pplib-cmd "get fanspeed 0"
PPLIB command execution has failed!
ati_pplib_cmd: execute "get" failed!
opencl@sagitta:~$ DISPLAY=:0.2 amdconfig --pplib-cmd "get fanspeed 0"
PPLIB command execution has failed!
ati_pplib_cmd: execute "get" failed!

That's awesome. Apparently we can't get/set the fan speed on this new card. Oh right, because this GPU uses the latest version of PowerTune which manages the fan speeds for you in two different modes: "Quiet," and "Uber." In "Quiet" mode, the maximum fan speed is 40%. In "Uber" mode, the maximum fan speed is 55%. No ability to manually set the fans to 100%. That couldn't possibly be problematic.

Let's see how the temps are doing at idle:

Code:
opencl@sagitta:~$ amdconfig --adapter=all --odgt

Default Adapter - AMD Radeon R9 290 Series
    Sensor: Temperature - 31.00 C

Default Adapter - AMD Radeon R9 290 Series
    Sensor: Temperature - 31.00 C

Default Adapter - AMD Radeon R9 290 Series
    Sensor: Temperature - 31.00 C

Default Adapter - AMD Radeon R9 290 Series
    Sensor: Temperature - 31.00 C

Default Adapter - AMD Radeon R9 290 Series
    Sensor: Temperature - 31.00 C

Default Adapter - AMD Radeon R9 290 Series
    Sensor: Temperature - 31.00 C

Default Adapter - AMD Radeon R9 290 Series
    Sensor: Temperature - 31.00 C

Default Adapter - AMD Radeon R9 290 Series
    Sensor: Temperature - 31.00 C

Hm, they're idling kind of high. Ambient temp in the office is about 21 C, and usually we see GPUs in here idling under 25C. I wonder what the clocks are set at?

Code:
opencl@sagitta:~$ amdconfig --adapter=all --odgc

Adapter 0 - AMD Radeon R9 290 Series
                            Core (MHz)    Memory (MHz)
           Current Clocks :    300           150
        Performance Level :    0
        Current Bus Speed :    8000
         Current Bus Lane :    16
                 GPU load :    0%

Adapter 1 - AMD Radeon R9 290 Series
                            Core (MHz)    Memory (MHz)
           Current Clocks :    300           150
        Performance Level :    0
        Current Bus Speed :    8000
         Current Bus Lane :    16
                 GPU load :    0%

Adapter 2 - AMD Radeon R9 290 Series
                            Core (MHz)    Memory (MHz)
           Current Clocks :    300           150
        Performance Level :    0
        Current Bus Speed :    8000
         Current Bus Lane :    16
                 GPU load :    0%

Adapter 3 - AMD Radeon R9 290 Series
                            Core (MHz)    Memory (MHz)
           Current Clocks :    300           150
        Performance Level :    0
        Current Bus Speed :    8000
         Current Bus Lane :    16
                 GPU load :    0%

Adapter 4 - AMD Radeon R9 290 Series
                            Core (MHz)    Memory (MHz)
           Current Clocks :    300           150
        Performance Level :    0
        Current Bus Speed :    8000
         Current Bus Lane :    16
                 GPU load :    0%

Adapter 5 - AMD Radeon R9 290 Series
                            Core (MHz)    Memory (MHz)
           Current Clocks :    300           150
        Performance Level :    0
        Current Bus Speed :    8000
         Current Bus Lane :    16
                 GPU load :    0%

Adapter 6 - AMD Radeon R9 290 Series
                            Core (MHz)    Memory (MHz)
           Current Clocks :    300           150
        Performance Level :    0
        Current Bus Speed :    8000
         Current Bus Lane :    16
                 GPU load :    0%

Adapter 7 - AMD Radeon R9 290 Series
                            Core (MHz)    Memory (MHz)
           Current Clocks :    300           150
        Performance Level :    0
        Current Bus Speed :    8000
         Current Bus Lane :    16
                 GPU load :    0%

Well that's different... Seems the output has changed a tad. Doesn't show the peak clocks or configurable range for some reason. But it now displays the bus speed and bus lane information. Great to know that all eight are running at 16x, I guess? Not sure why I'd give two shits about that. And seriously, why isn't it showing me the peak clocks and the configurable range?

I'm assuming that, like the HD 7990, these cards run at the boost clock of 1000 Mhz out of the box, instead of the rather odd base clock of 727 Mhz. Let's try to manually set the clock at some value in the middle, like 850 Mhz, and see how it does there:

Code:
opencl@sagitta:~/oclHashcat-1.00$ amdconfig --adapter=all --od-enable --odsc=850,1275
AMD Overdrive(TM) enabled
invalid input. Please use the following format
"--od-setclocks=<NewCoreClock>,<NewMemoryClock>,<PowerState>,<Performance Level>"

WTF? Oh, right, the R9 line uses Overdrive6 instead of Overdrive5. Goes hand-in-hand with the latest PowerTune. So, we have to set different clocks for different power states and performance levels? I'm not really sure what the valid values are here for these parameters. Can't seem to find anything in the help, or on Google. Fuck it, let's just start messing with it and see what happens:

Code:
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,0,0
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,0,1
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,0,2
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,0,3
ERROR - invalid performance level! Level should be less than 2
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,1,0
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,1,1
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,1,2
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,2,0
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,2,1
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,2,2
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,3,2
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,4,2
opencl@sagitta:~$ amdconfig --adapter=all --odsc=1000,1250,5,2

So, nothing happens. No confirmation that the clocks were set like we normally would get. Not getting any errors, except for the error we get if we set the performance level greater than 2 (which makes sense.) Since amdconfig no longer shows the peak clock, I have no idea what the clocks are set to...

Ok, let's shelve that for now and just check out oclHashcat already, so that we can start having fun again. atom was kind enough to push b49 out this morning, which has R9 support. This card can still redeem itself if it performs well.

What the actual fuck. It's taking FOREVER to enqueue the kernels. Like 30-mississippi seconds for each kernel. Four minutes to initialize Hashcat with eight GPUs. I'm not talking about the kernel compile time. This happens even with cached kernels. This is the time it takes to actually enqueue a kernel on each device for execution. WTF. This makes benchmark mode absolutely unusable, because it takes 5 minutes to benchmark each algorithm, and since it suppresses the 'loading kernel' messages, you're left wondering if Hashcat has hung.

Once you do get oclHashcat running, performance is lackluster at best. At "stock clocks," it's 4.8% slower than an HD 7970 at 925 Mhz. And I put that in quotes, because thanks to PowerTune, the clocks are all over the place. And even though I (attempted to) set the maximum clock to 1000 Mhz, the clocks aren't going over 855 Mhz.


[Image: performance_thumb.png]
click for larger image



One nice thing about this new version of PowerTune, however, is that it automatically downclocks the memory clock for ALU-bound workloads. This is something we used to do manually on the HD 5000 series, then lost the ability to do on the HD 6000 and HD 7000 series. I really like this feature. This is the only positive change I am able to identify so far, and I pray it sticks around.

I still don't understand why this isn't going above 855 Mhz under 100% load on an ALU-bound algorithm. The stock boost clock is 1000 Mhz, and we manually set the clock to 1000 Mhz, yet it won't go anywhere near that. And it's not a heat issue, because the temps look really good right now. We're 30 minutes in, and the temps are all below 65C.

Ok, something weird just happened. The clocks just all spiked up to 940 Mhz, then immediately dropped down to 525 Mhz. They stayed at 525 Mhz for about 10 seconds, enough for the temps to drop down to around 54C, and then went back up to 848 Mhz. How strange is that??

I can't seem to disable this PowerTune crap either. On the HD 7950, it wasn't too bad, because we could use ADL to set PowerTune to "+20%", whatever that means, and it would behave properly. Using the same method on this card yields fuckall. Same thing with using ADL to manually set the power state or performance level. This card just ignores all requests to let the user manage it.

I'm really hoping some of these things are driver issues, even though I'm using the latest beta driver, but I have a feeling these issues are here to stay. Not being able to manually manage the device, the inability to get/set fan speed, disable PowerTune, or set the clocks, is an absolute deal-breaker. And seriously, what the actual fuck is up with it taking 30 full fucking seconds to queue a kernel!?

I'll keep playing with this, and I'll keep an eye out to see if others discover any fixes/workarounds for these issues. But until we get this shit sorted out, I'd highly recommend you steer clear of this new R9 line. Stick to the HD 7970 for now rush out and buy one immediately!


UPDATE Nov 17 2013

I've confirmed that the management issues are due to this being an Overdrive6 card, as opposed to Overdrive5. The tools we currently use to manage our GPUs still rely on Overdrive5. Using ADL SDK 6.0, I wrote a small program called od6config to manage the GPUs using Overdrive6. It works as expected, so the fan speed and clock rate issues are resolved.

The only outstanding issue is the PowerTune crap and kernel load time. Using Overdrive6 I can set the power control threshold to +50 (new PowerTune accepts values from -50 to +50 vs the old version which accepted values from -20 to +20), but Hashcat performance is still erratic. And it still takes 30 seconds to enqueue a kernel.

UPDATE Dec 21 2013

Catalyst 13.12 resolves all remaining issues!!
#2
Hopefully a new driver will fix it. Guys on windows have had success setting the fan speeds, so I dont see why linux wouldn't be able to.
#3
It would be awesome if a new driver resolved all of these problems. But, it's AMD, so who knows.

It looks to me like they failed to deliver on GCN2, rushed to push out a new GCN-based card, added too many stream processors for the current cooling solution, and then lowered the clocks and implemented aggressive power management in an attempt to keep the card running at acceptable temperatures.

That's what it looks like from where I'm sitting.
#4
Sweet rig man. Do you mind sharing the specs for this?

Thanks
#5
i got 4x7970 in rig (on catalyst 13.4) and kernels also loading very long, dont know what on it depends but one gpu loading kernels for 5-30sec
#6
Can you test one (or more) 290x on Windows with latest drivers? There are many reviewers reporting problems and these are related by the temperature controlled GPU frequency and in the latest beta9.2 driver has some fix.
#7
I believe his set-up is running on Linux as Windows always had issues addressing more than 6 GPU's. The Catalyst 13.11 Beta 9.2 for Windows makes the Radeon R9 290X a little louder as it adds about 100 RPM to the "Quiet" mode; not sure how much more RPM's are added under Uber mode. The increased noise emitted from the GPU is definitely a tradeoff but will allow it to reach higher clock rates but not the 1000Mhz known for this GPU. We will need to wait for the next Linux driver update and see what updates epixoip post from there.
#8
windows works just fine with 8 GPUs, don't know what you're talking about regarding issues addressing more than 6 GPUs.

i don't know if i'll have time to test the windows driver to see if it behaves better. but i do plan on spending some time with the overdrive6 API, as i have a feeling that is the key here.
#9
(11-15-2013, 03:08 AM)epixoip Wrote: windows works just fine with 8 GPUs, don't know what you're talking about regarding issues addressing more than 6 GPUs.

i don't know if i'll have time to test the windows driver to see if it behaves better. but i do plan on spending some time with the overdrive6 API, as i have a feeling that is the key here.

Unfortunately I had direct and negative experience with the following configuration with the 6XXX generation series and the 12.X drivers that existed at the time. I have since reverted to Linux and never looked back. I have the system configuration well documented and can share it with you if interested. Have you yourself ever attempted to configure x8 individual single core/asic gpu's on Windows 7 x64? If you can advise me of what generation of ATI GPU's, chassis model and driver set that you were successful with interacting with gpu's within Windows it would be nice to know incase I ever consider the Windows route again.

Also see posting here: even the newer Tyan series is exhibiting the same detection/# of usable devices within Windows 7 environment.

http://hashcat.net/forum/thread-2776.html
#10
What motherboard is it?