03-05-2015, 09:53 PM
With oclHashcat v1.32 we added the AMP kernel that compute the password candidates on GPU. That had one advantage and one disadvantage.
The advantage was that you can now get full cracking speed even for the faster ones of the slow hashes like md5crypt because there's no bottleneck on the candidate generator engine.
The disadvantage is that the candidate are now generate in parallel, because they are generated on GPU. When you amplify your input wordlist with your rules for example, they now are copied raw on the bus and mangled on the gpu, in parallel. But if you have only a wordlist with a few words, like 1000, and you have 20000 shaders on your GPU, you can only make use of 1000 of you parallelism power. The good thing about this is that you can workaround this behavior and let 1.33 work like 1.31. I'll show you how to do this.
I'll now show how speeds change on my old but still working hd5770 cracking WPA:
Benchmark results:
Benchmark is basically the same as brute-force with -w 3:
And even with a wordlist you can archieve this speed, it just needs to be big enough and a little amplifier:
Your problem that you might have seen since is when you have a to small wordlist:
But, you can workaround this simply by using a pipe (also on windows). The following example using the same input:
VoilĂ , we're back in business! Just to finish this. With v1.32+ we can choose ourself if we want "vertical" or "horizontal" parallelism. That why it is how it is.
The advantage was that you can now get full cracking speed even for the faster ones of the slow hashes like md5crypt because there's no bottleneck on the candidate generator engine.
The disadvantage is that the candidate are now generate in parallel, because they are generated on GPU. When you amplify your input wordlist with your rules for example, they now are copied raw on the bus and mangled on the gpu, in parallel. But if you have only a wordlist with a few words, like 1000, and you have 20000 shaders on your GPU, you can only make use of 1000 of you parallelism power. The good thing about this is that you can workaround this behavior and let 1.33 work like 1.31. I'll show you how to do this.
I'll now show how speeds change on my old but still working hd5770 cracking WPA:
Benchmark results:
Quote:root@sf:~/oclHashcat-1.34# ./oclHashcat64.bin -b -m 2500 -d 2
oclHashcat v1.34 starting in benchmark-mode...
...
Speed.GPU.#1.: 56509 H/s
Benchmark is basically the same as brute-force with -w 3:
Quote:root@sf:~/oclHashcat-1.34# ./oclHashcat64.bin -a 3 -m 2500 -d 2 -w 3 hashcat.hccap ?a?a?a?a?a?a?a?a
oclHashcat v1.34 starting...
...
Speed.GPU.#1...: 56431 H/s
And even with a wordlist you can archieve this speed, it just needs to be big enough and a little amplifier:
Quote:root@sf:~/oclHashcat-1.34# ./oclHashcat64.bin -a 0 -m 2500 -d 2 -w 3 hashcat.hccap /root/dict/untouched/rockyou.txt -r rules/best64.rule
oclHashcat v1.34 starting...
...
Speed.GPU.#1...: 56384 H/s
Your problem that you might have seen since is when you have a to small wordlist:
Quote:root@sf:~/oclHashcat-1.34# ./oclHashcat64.bin -a 0 -m 2500 -d 2 -w 3 hashcat.hccap rockyou1k.txt -r rules/best64.rule
oclHashcat v1.34 starting...
Speed.GPU.#1...: 11024 H/s
But, you can workaround this simply by using a pipe (also on windows). The following example using the same input:
Quote:root@sf:~/oclHashcat-1.34# ./hashcat-cliAVX2.bin rockyou1k.txt -r rules/best64.rule --stdout | ./oclHashcat64.bin -a 0 -m 2500 -d 2 -w 3 hashcat.hccap
oclHashcat v1.34 starting...
...
Starting attack in stdin mode...
Speed.GPU.#1...: 56720 H/s
VoilĂ , we're back in business! Just to finish this. With v1.32+ we can choose ourself if we want "vertical" or "horizontal" parallelism. That why it is how it is.