02-21-2011, 03:50 PM
Hello hashcat community,
as some of you guys already know, i started to play around with the BFI_INT instruction (required me to hack the binary kernels of oclHashcat). It was much fun when it worked and also produces nice results. But i found out that it is efficient on single hashes only. Basically the same situation as with bitalign. So that motivates me to write my first single hash optimized reversing MD5 kernel and wow, also got some nice results.
After two more days optimizing i can claim new -world record-
Just to quickly throw in a number: 9637 M/s on a single hd5970 on stock clocks.
This is 11% faster than ighashgpu v0.92.17.2 (8561 M/s) or 14% faster than whitepixel-2 (8275 M/s).
Here is a list of all the hash algos that i ported and optimized for single hash cracking:
An additional bonus is that you can still use oclHashcats mask generator (or per-position charset). You are not limited to brute force! Since this is not based on wordlist, there is no left and right side. Just write down the mask and it will start.
But thats not all. I can claim new world record on -all- the listed algorithms. Here is a benchmark showing M/s:
And for those who are a bit more interessted in technical details here is the ALU utilization from my hd5770 on an 8-element vector:
If you want to reproduce these numbers, here is how i generated them (self-aborts after 60 seconds):
./oclHashcat64.bin --runtime 60 --gpu-accel 800 --gpu-loops 1024 -1 ?l?d?s?u -m 900 77777777777777777777777777777777 ?1?1?1?1?1?1?1?1
./oclHashcat64.bin --runtime 60 --gpu-accel 800 --gpu-loops 1024 -1 ?l?d?s?u -m 1000 77777777777777777777777777777777 ?1?1?1?1?1?1?1?1
./oclHashcat64.bin --runtime 60 --gpu-accel 800 --gpu-loops 1024 -1 ?l?d?s?u -m 0 77777777777777777777777777777777 ?1?1?1?1?1?1?1?1
./oclHashcat64.bin --runtime 60 --gpu-accel 800 --gpu-loops 1024 -1 ?l?d?s?u -m 3 77777777777777777777777777777777 ?1?1?1?1?1?1?1?1
./oclHashcat64.bin --runtime 60 --gpu-accel 800 --gpu-loops 1024 -1 ?l?d?s?u -m 100 7777777777777777777777777777777777777777 ?1?1?1?1?1?1?1?1
./oclHashcat64.bin --runtime 60 --gpu-accel 800 --gpu-loops 1024 -1 ?l?d?s?u -m 200 7777777777777777 ?1?1?1?1?1?1?1?1
./oclHashcat64.bin --runtime 60 --gpu-accel 800 --gpu-loops 1024 -1 ?l?d?s?u -m 300 7777777777777777777777777777777777777777 ?1?1?1?1?1?1?1?1
./oclHashcat64.bin --runtime 60 --gpu-accel 800 --gpu-loops 1024 -1 ?l?d?s?u -m 1400 7777777777777777777777777777777777777777777777777777777777777777 ?1?1?1?1?1?1?1?1
To be fair, here are some disadvantages:
Keep in mind, this is just a hack. I am not sure if i will add these special optimized kernels to oclHashcat distribution.
But the results were cool so i wanted to share it with you guys. Have fun with it.
Here is it: http://hashcat.net/files/oclHashcat-0.26b.7z
as some of you guys already know, i started to play around with the BFI_INT instruction (required me to hack the binary kernels of oclHashcat). It was much fun when it worked and also produces nice results. But i found out that it is efficient on single hashes only. Basically the same situation as with bitalign. So that motivates me to write my first single hash optimized reversing MD5 kernel and wow, also got some nice results.
After two more days optimizing i can claim new -world record-
Just to quickly throw in a number: 9637 M/s on a single hd5970 on stock clocks.
This is 11% faster than ighashgpu v0.92.17.2 (8561 M/s) or 14% faster than whitepixel-2 (8275 M/s).
Here is a list of all the hash algos that i ported and optimized for single hash cracking:
- MD4
- NTLM
- MD5
- md5(md5())
- SHA1
- MySQL
- MySQL 4.1+
- SHA256
An additional bonus is that you can still use oclHashcats mask generator (or per-position charset). You are not limited to brute force! Since this is not based on wordlist, there is no left and right side. Just write down the mask and it will start.
But thats not all. I can claim new world record on -all- the listed algorithms. Here is a benchmark showing M/s:
And for those who are a bit more interessted in technical details here is the ALU utilization from my hd5770 on an 8-element vector:
If you want to reproduce these numbers, here is how i generated them (self-aborts after 60 seconds):
./oclHashcat64.bin --runtime 60 --gpu-accel 800 --gpu-loops 1024 -1 ?l?d?s?u -m 900 77777777777777777777777777777777 ?1?1?1?1?1?1?1?1
./oclHashcat64.bin --runtime 60 --gpu-accel 800 --gpu-loops 1024 -1 ?l?d?s?u -m 1000 77777777777777777777777777777777 ?1?1?1?1?1?1?1?1
./oclHashcat64.bin --runtime 60 --gpu-accel 800 --gpu-loops 1024 -1 ?l?d?s?u -m 0 77777777777777777777777777777777 ?1?1?1?1?1?1?1?1
./oclHashcat64.bin --runtime 60 --gpu-accel 800 --gpu-loops 1024 -1 ?l?d?s?u -m 3 77777777777777777777777777777777 ?1?1?1?1?1?1?1?1
./oclHashcat64.bin --runtime 60 --gpu-accel 800 --gpu-loops 1024 -1 ?l?d?s?u -m 100 7777777777777777777777777777777777777777 ?1?1?1?1?1?1?1?1
./oclHashcat64.bin --runtime 60 --gpu-accel 800 --gpu-loops 1024 -1 ?l?d?s?u -m 200 7777777777777777 ?1?1?1?1?1?1?1?1
./oclHashcat64.bin --runtime 60 --gpu-accel 800 --gpu-loops 1024 -1 ?l?d?s?u -m 300 7777777777777777777777777777777777777777 ?1?1?1?1?1?1?1?1
./oclHashcat64.bin --runtime 60 --gpu-accel 800 --gpu-loops 1024 -1 ?l?d?s?u -m 1400 7777777777777777777777777777777777777777777777777777777777777777 ?1?1?1?1?1?1?1?1
To be fair, here are some disadvantages:
- ATI 4xxx lack support for constant memory on opencl. Therefore you will not get full performance.
- NVidia sm_21 cards require special handling to get full performance. I may add these to a later version.
- Not possible to restore a session. However you can pause / resume it on the fly.
- Supports Multi-GPU yes, but not in "mixed" configurations. For example 9600gt and gtx480.
Keep in mind, this is just a hack. I am not sure if i will add these special optimized kernels to oclHashcat distribution.
But the results were cool so i wanted to share it with you guys. Have fun with it.
Here is it: http://hashcat.net/files/oclHashcat-0.26b.7z