cudaHashcat on Tesla
#1
Hi all,
Recently I have been desperately trying to get cudaHashcat 0.15 to work on a Tesla 2075 to no avail, hence I decided to open a thread here and see if anyone can help me.
I have installed and fully updated Ubuntu server 12.04.3 LTS on the machine. At first I tried installing the official 319.60 driver from NVIDIA but after lots of trouble to get it work, just as I ran the cudaExample0.sh on the machine, some info about temperature triggers and stuff were printed on the screen and before any computation began, the OS went into some of kind of kernel panic and everything stopped. The screen still accepted my keystrokes but didn't actually apply them and the only way out of it was a manual reset. I could reproduce the exact same behavior without exception.
Then I tried the driver from Ubuntu's repository (319.xx) even though I had read somewhere that it doesn't support Cuda since it is reverse-engineered and everything. After installing it, I tried the same example but to all my surprise, I noticed that the script ended without printing any info or doing any computation or generating any errors. Same happened when I ran the main binary file (cudaHashcat-plus64.bin). Thinking that it must be because the driver from the repository doesn't actually support cuda, I gave up on it.
Later I discovered that Nvidia also has a newer driver (325.15) but after installing it, cudaHashcat showed the exact same behavior as the driver from Ubuntu's repository did, the example scripts/main binary just opened and closed without producing any type of output. I even tried the official 319.60 driver on CentOS but again the same behavior as the last two examples: cudaHashcat simply opens and closes without producing any output.
So, if anyone has experience with running cudaHashcat on Tesla GPUs, I'd really appreciate if you help me resolve the issue.

P.S. Yes, I know that Nvidia's gpus aren't at all suited for this type of application, I'm just running a test at the moment.

P.P.S. I just tried cudaHashcat 0.14 on the 325.15 driver, this time the first scenario occurred and the whole machine went down...
#2
can you put the following on pastebin and link it here (full output):

nvidia-smi -a -q

copy the command line out of the example script and run it by hand, post all the output you get

ls -la /dev/nvidia*

lsmod
#3
Here is the output of the commands:

http://pastebin.com/C1X1ACga

This is from another identical machine that is still running the older 319.60 driver (official Nvidia driver) but exhibits the exact same behavior.
If I run cudaHashcat-plus64.bin directly from shell, with or without any switches, just like running the test scripts, either the OS breaks down or the application simply opens and closes without any output.
Also I have noticed that running nvidia-smi on both machines, while the official Nvidia driver is installed on them, well result in the display of the monitor connected to the VGA become completely corrupted which cannot be fixed without a reboot; the OS doesn't break down in this case, though. This doesn't happen with the driver from Ubuntu's repository.

P.S. Can you confirm/deny whether the driver from Ubuntu's repository is good enough for ocl/cudaHashcat?
#4
No, never use the OS provided drivers, always use the ones from nvidia.

the screen corruption is interesting, I was expecting to see nouveau loaded which caused that problem for me, but its not there. try

apt-get purge xserver-xorg-video-nouveau

reboot, and see what happens. and for anyone else, yes, im grasping at straws.
#5
On the other machine I had blacklisted nouveau both in blacklist.conf and grub's config file but in this one it is only blacklisted in blacklist.conf. I'll try removing it all together when I get back home and report back.

P.S. I checked and it seems the motherboards already have the latest bios, does anybody have any idea where I might be able to get firmware updates for the Teslas? Google wasn't of much help.
#6
So, I'm back home now; it seems xserver-xorg-video-nouveau is not even installed on the machine, there is just libdrm-nouveau1a among the installed packages which I can't remove because some other packages depend on it, looks like something has gone wrong somewhere in the package management system because when I try to remove libdrm-nouveau1a, I get errors about unmet dependencies in other packages while them and their dependencies are actually already installed! Any other suggestions?
#7
Another update (sorry for triple-posting), I installed VMware ESXi on the first machine and installed Kubuntu 13.10 in a VM and used PCI passthrough to dedicate the Tesla card to the VM, installed the official 325.15 drivers (which needed some manual modifications to compile properly on the 3.11 kernel) and now both the driver and ocl/cudaHashcat work fine on the machine; makes me wonder why they don't work properly on Ubuntu 12.04...
#8
It seems that it works w/ Kubuntu 13.10 just because it is a *clean* install...
Maybe you can try to do the same w/ 12.04 (setup a new VM - install everything from scratch there) .... it could be that the main problem is that you have still some libs etc of other installs (older driver installations, some libcudas around) in your host OS.

You could do the VM test and if it works you should try to uninstall every possible NV-related package + libs (libcuda etc, sudo updatedb; locate libcuda), also sudo dpkg --get-selections | grep -i '\(nvidia\|nouveau\)' might indicate some remaining packages etc
#9
Well, the Ubuntu 12.04 installations were pretty much new; they were installed from an old image but were updated to v12.04.3 right away. I just installed Kubuntu 13.10 on the second machine too, this time without VMware, and it behaves properly like the other machine. I can't explain this in any way other than for whatever reason, Nvidia's driver doesn't seem to like to behave properly on the 3.2 kernel with this particular card installed.