Short Wordlists with 8 GPUs
#1
I have an 8 GPU cluster running GTX 980 GPU's. When I use a small wordlist it only uses a single GPU. I'm using large lists of hashes (100K or more) and it takes about 36 minutes and I'd like to use multiple GPUs to speed this up. I've read this:

https://hashcat.net/forum/thread-4161.html

I've tried the suggestion to pipe in the wordlist but that makes no difference to hashing speed and it still uses a single GPU.

Is there a way to get cudaHashcat to use all GPU's when using a small (10,000 word) dictionary? I've tried the above suggestion which just says to pipe in the wordlist, but that makes no difference. Hashing speed is the same and it still uses only one GPU.

The only option I can think of is to break a large number of hashes into smaller blocks and run 8 cudaHashcat processes and use -d option to specify GPU to balance load.

I'm using v1.35. Here's an example command line:

Code:
cat 10kpasswds.txt | cudaHashcat64.bin -m 400 -w 3 hashes.txt --session mysess1

Thanks very much for any help and thanks for an amazing piece of software.
#2
Atom is showing in his thread that you should expand your candidates with rules so your shaders get filled.

./hashcat-cliAVX2.bin rockyou1k.txt -r rules/best64.rule --stdout | ./oclHashcat64.bin -a 0 -m 2500 -d 2 -w 3 hashcat.hccap

That is different to your cat since your cat does not multiply the candidates by the best64.

But in the end your problem might not be fixed by that, 10k initial candidates is just not enough.
sch0.org
#3
(03-26-2015, 08:29 AM)kartan Wrote: Atom is showing in his thread that you should expand your candidates with rules so your shaders get filled.

./hashcat-cliAVX2.bin rockyou1k.txt -r rules/best64.rule --stdout | ./oclHashcat64.bin -a 0 -m 2500 -d 2 -w 3 hashcat.hccap

That is different to your cat since your cat does not multiply the candidates by the best64.

But in the end your problem might not be fixed by that, 10k initial candidates is just not enough.

Looking at the post he has this:

Code:
./oclHashcat64.bin -a 0 -m 2500 -d 2 -w 3 hashcat.hccap rockyou1k.txt -r rules/best64.rule

vs

Code:
./hashcat-cliAVX2.bin rockyou1k.txt -r rules/best64.rule --stdout | ./oclHashcat64.bin -a 0 -m 2500 -d 2 -w 3 hashcat.hccap

Which both generate identical dictionaries unless I'm missing something, the only difference is that the second one gets the data via STDIN. They both use the same txt file and both use best64.rule to munge the wordlist.

So my impression is that he's saying if you just feed in the same dictionary via STDIN you'll get better parallelism.
#4
pHaze, aha.. i have same problem with 4 GPU(970-980 mixed). If pipe do not help to you, you can use one of "-w 2 --gpu-accel=2" ... "-w 2 --gpu-accel=10"
In my case it use all GPU, but speed not always full. Then you can append --gpu-loops=1024 (or similar)

This is voodoo magic Smile
#5
Code:
./oclHashcat64.bin -a 0 -m 2500 -d 2 -w 3 hashcat.hccap rockyou1k.txt -r rules/best64.rule

vs

Code:
./hashcat-cliAVX2.bin rockyou1k.txt -r rules/best64.rule --stdout | ./oclHashcat64.bin -a 0 -m 2500 -d 2 -w 3 hashcat.hccap

those are different. Rules are run in the gpu, when you run the rules infront of the pipe hashcats initial candidates are already multiplied by the rules before they hit the gpu
sch0.org
#6
(03-26-2015, 08:41 AM)shodan Wrote: pHaze, aha.. i have same problem with 4 GPU(970-980 mixed). If pipe do not help to you, you can use one of "-w 2 --gpu-accel=2" ... "-w 2 --gpu-accel=10"
In my case it use all GPU, but speed not always full. Then you can append --gpu-loops=1024 (or similar)

This is voodoo magic Smile

Thanks Shodan. I tried that and it's still using 1 GPU. How big (edit: fixed spelling) is your dictionary?
#7
(03-26-2015, 08:44 AM)kartan Wrote: Code:
./oclHashcat64.bin -a 0 -m 2500 -d 2 -w 3 hashcat.hccap rockyou1k.txt -r rules/best64.rule

vs

Code:
./hashcat-cliAVX2.bin rockyou1k.txt -r rules/best64.rule --stdout | ./oclHashcat64.bin -a 0 -m 2500 -d 2 -w 3 hashcat.hccap

those are different. Rules are run in the gpu, when you run the rules infront of the pipe hashcats initial candidates are already multiplied by the rules before they hit the gpu

OK thanks Kartan that makes sense. So it looks like with a small 10,000 word list the only way to get multiple GPU's working on it is to split the hashes across 8 files and then launch 8 processes of cudaHashcat and get each to specify the GPU to use.
#8
Hmm...
i use combinator mode in with this "20k dict"+"0000-zzzz mask".
Also i have this problem in brute-force mode with small mask.
In my case this options is help.

Maybe 10k dict is really small for this ?!

Split hashes is a good idea! I have try it.
#9
(03-26-2015, 08:54 AM)shodan Wrote: Hmm...
i use combinator mode in with this "20k dict"+"0000-zzzz mask".
Also i have this problem in brute-force mode with small mask.
In my case this options is help.

Maybe 10k dict is really small for this ?!

Yes you're expanding that 20k dictionary with the combinator attack.

Yes 10K is quite small. I want to be able to process 100,000 WordPress hashes in under 30 minutes and to do that I have to keep the dictionary very small.