Slow VCL performance
#1
Good morning all!

First time poster, been reading for a while...

I set up of small proof of concept VCL setup, currently with the following:

Broker- workstation (Ubuntu 12.04LTS x64, VCL 1.22), 8GB RAM
Compute node- (Ubuntu 12.04 server x64, Catalyst 13.11 beta1, oclhashcat-plus 0.15) Radeon 7990
Network- dedicated Gb switch, only broker and node on it

This setup was according to the wiki articles, by the way.

I started with a --gpu-loops of 1024, and thought I would try and ramp up the --gpu-accel to see how much I could do with only 8GB RAM. I can get up to --gpu-accel 225, anything higher, it just goes to "cracked" right away, and returns the hashes with blank results- not sure if this is expected due to lack of memory, or I'm doing something wrong?

With --gpu-loops of 1024 of --gpu-accel of 225, I get ~3000K/sec on md5crypt (-m 500); problem is, if I run locally on the compute node I get ~7500K/sec.

I would have thought that with only 2 GPUs, I could get away with gigabit and 8GB RAM- is this just not the case? It's one of those things where I kind of need to get a working setup to justify spending of Infiniband and a bunch of memory. When I run a top during a crack, memory goes to about ~30% with --gpu-accel of 225.

One more thing- the first GPU is reported as 1000mhz, but the second GPU is reported as 555mhz? Both, however, gives the same cracking speed. If I run locally both show as 1000mhz.

I also have two 6970s on hand, would it be worth trying only a single GPU to see if the network/memory handle it better?

Appreciate the help, not sure how to troubleshoot this.
#2
VCL 1.22 has some bugs, hence the note on the VCL webpage.

the number of gpus has nothing to do with getting away with gigabit. ethernet latency is ethernet latency, regardless of how many gpus and how many nodes you have, and ethernet latency sucks.

with only 8 GB of ram on the broker you likely don't want to use a -n value of more than 80.

if the clocks are being reported incorrectly, restart the broker daemon first, then restart opencld. should fix that.
#3
Thanks for the reply- I will compare performance with a lower -n value. Would there be any value in going to VCL 1.21?

I noticed that I didn't mention this is a brute-force attack, by the way.

I guess that what I had read referred to the amount of bandwidth per GPU; with the latency that one gets on a standard gigabit switch, is it simply not reasonable to expect decent performance? Can I expect to get comparable performance to local over gigabit, or is it simply not going to happen (assuming a low -n value for now, will add RAM later to increase value)?

***EDIT***

Just tried with a -n of 80, getting ~1700K/sec through VCL and ~4000K/sec through local. Also, tried restarting the daemons and clock is still misreported.

Am I doing something wrong, or is this to be expected?

Thanks for your time...
#4
1.21 is more broken than 1.22 Wink

bandwidth is not an issue, see my passwords12 talk. latency is the issue with ethernet. you're never going to get full performance with ethernet, it is not possible.

something appears to be wrong if the clocks are still being reported incorrectly.
#5
Interesting results-

-I've reconfigured to run the broker on a compute node with two 7990s
-this connects over gigabit to another compute node with two 6970s

This is a -a 3 -m 500 session (brute force, md5crypt); all six GPUs detect properly.

Running in VCL, the four 7990 GPUs are slow as hell (~150K/sec), even though they are on the same physical box as the broker, so "local".

The two 6970s on the compute node, however, are running at normal speed (~2200K/sec), even though they are going through the gigabit network!

To me, this is VCL not working right since the networked node is reaching normal speeds; I had thought the network was the bottleneck, apparently not?
#6
Just as an update to my setup and results-

Broker - now a 1U server with 12GB RAM (getting more soon)
Node 1 - 7990, 6970
Node 2 - 7990, 6970

Network - still dedicated Gb switch
Software- oclhashcat-plus 0.15, VCL 1.22

Guess the setup didn't like having 2 7990s in one box, because this way I'm getting much better results. Individually, the boxes hit ~10 000K/sec in md5crypt, and now, in VCL, I'm getting ~12 000K/sec total out of the six GPUs; still a lot of loss, but at least the sum is greater than the parts!

Gonna try and test with some Infiniband hardware next, and hopefully get a normal, fully running cluster.