This article is primary focused on showing known bugs of the software that oclHashcat* is depending on:
AMD
Catalyst Driver
Accelerated Parallel Processing “APP”
SDK, formerly ATI Stream
ATI Display Library “ADL”
SDK
NVidia
All these software packages have bugs or limitations that influences oclHashcat*.
This topic is huge, thats why i will start with the current status. With time, when Drivers and SDKs change, and the change influences the behavior of oclHashcat*, i will update this article.
Tested versions: 12.4
On Linux, the OpenCL library requires an (unneccessary) running X11 server. Additionally, the user which is running the OpenCL application must have a valid Xsession.
Using OpenCL it is not possible to access the fast (shared, constant) memory on hd4xxx series cards. These cards can run in full performance only using the CAL.
Drivers are limited to max 8 GPU (= 4 dual-gpu cards) per computer.
This catalyst version is not backward compitable to older oclHashcat* kernels. Make sure you use the latest versions.
Version: 2.6
Querying the global memory size using CL_DEVICE_GLOBAL_MEM_SIZE works incorrect. It returns either 128mb, 256mb or 512mb but not the real gobal memory size. This can get overriden by setting GPU_MAX_HEAP_SIZE environment variable but modifying this is not officially supported by AMD.
Querying the core clock frequency using CL_DEVICE_MAX_CLOCK_FREQUENCY does not work at all. It returns random or zero results.
Version: 3.0
There is no unique identifier that can be used to map between CAL/OpenCL devices and ADL device type. oclHashcat* is forced to use a fuzzy logic on the device names!
This library is not thread safe. In a world of Multi-core CPU and Multi-core GPU this is a joke. I am trying to workaround this bug in future versions of oclHashcat*.
Many people ask why oclHashcat uses CUDA for NVidia even if they have their own OpenCL runtime. So why not use it?
There are three reasons:
With CUDA 4.0 they introduced an awesome feature called “PTX inline assembly”. With this its possible to do very specific optimizations. oclHashcat* actually makes use of this feature. This is not supported in their OpenCL runtime.
The first oclHashcat version used OpenCL, even on NVidia. But then, with release of the ForceWare 260.x driver, something strange happend. For some unknown reason, the performance on all OpenCL kernels dropped by nearly 15% after upgrading to ForceWare 260.x. On the other hand, the performance for the native CUDA kernel did not change. OK, that sounds a bit like conspiracy like: “NVidia does not want OpenCL to be faster than CUDA”. But if you want you can verify it yourself. It was a lot of work to redo everything for CUDA.
The OpenCL runtime can not cross-compile binary kernels as the native nvcc compiler for CUDA kernels can. Since oclHashcat* is closed-source this is a very important feature. Thats the reason why you have so many *.kernel files in your oclHashcat* installation directories. Because it ships with a specific kernel for each GPU type that support GPGPU.
Version: R270-developer
The NVAPI is used for fetching GPU temperature and utilization. Problem: It is for Windows only and there is no alternate library for Linux.
For Linux there is only a binary called “nvidia-smi” (part of the ForceWare driver). It can be used from commandline to fetch GPU temperature, but the utilization is available for Tesla cards only. Thats the reason why you always see 0% GPU utilization.
AMD
Catalyst 12.4 or later
NOTE: Do not install the AMD APP SDK if you just want to use oclHashcat*. You only need this if you are developing GPGPU applications.
NVidia
ForceWare 290.40 or later