oclHashcat v0.26b
#1
Hello hashcat community,

as some of you guys already know, i started to play around with the BFI_INT instruction (required me to hack the binary kernels of oclHashcat). It was much fun when it worked and also produces nice results. But i found out that it is efficient on single hashes only. Basically the same situation as with bitalign. So that motivates me to write my first single hash optimized reversing MD5 kernel and wow, also got some nice results.

After two more days optimizing i can claim new -world record-

Just to quickly throw in a number: 9637 M/s on a single hd5970 on stock clocks.

This is 11% faster than ighashgpu v0.92.17.2 (8561 M/s) or 14% faster than whitepixel-2 (8275 M/s).

Here is a list of all the hash algos that i ported and optimized for single hash cracking:
  • MD4
  • NTLM
  • MD5
  • md5(md5())
  • SHA1
  • MySQL
  • MySQL 4.1+
  • SHA256

An additional bonus is that you can still use oclHashcats mask generator (or per-position charset). You are not limited to brute force! Since this is not based on wordlist, there is no left and right side. Just write down the mask and it will start.

But thats not all. I can claim new world record on -all- the listed algorithms. Here is a benchmark showing M/s:

[Image: oclHashcat-0.26b-bench.png]

And for those who are a bit more interessted in technical details here is the ALU utilization from my hd5770 on an 8-element vector:

[Image: oclHashcat-0.26b-alu.png]

If you want to reproduce these numbers, here is how i generated them (self-aborts after 60 seconds):


./oclHashcat64.bin --runtime 60 --gpu-accel 800 --gpu-loops 1024 -1 ?l?d?s?u -m 900 77777777777777777777777777777777 ?1?1?1?1?1?1?1?1
./oclHashcat64.bin --runtime 60 --gpu-accel 800 --gpu-loops 1024 -1 ?l?d?s?u -m 1000 77777777777777777777777777777777 ?1?1?1?1?1?1?1?1
./oclHashcat64.bin --runtime 60 --gpu-accel 800 --gpu-loops 1024 -1 ?l?d?s?u -m 0 77777777777777777777777777777777 ?1?1?1?1?1?1?1?1
./oclHashcat64.bin --runtime 60 --gpu-accel 800 --gpu-loops 1024 -1 ?l?d?s?u -m 3 77777777777777777777777777777777 ?1?1?1?1?1?1?1?1
./oclHashcat64.bin --runtime 60 --gpu-accel 800 --gpu-loops 1024 -1 ?l?d?s?u -m 100 7777777777777777777777777777777777777777 ?1?1?1?1?1?1?1?1
./oclHashcat64.bin --runtime 60 --gpu-accel 800 --gpu-loops 1024 -1 ?l?d?s?u -m 200 7777777777777777 ?1?1?1?1?1?1?1?1
./oclHashcat64.bin --runtime 60 --gpu-accel 800 --gpu-loops 1024 -1 ?l?d?s?u -m 300 7777777777777777777777777777777777777777 ?1?1?1?1?1?1?1?1
./oclHashcat64.bin --runtime 60 --gpu-accel 800 --gpu-loops 1024 -1 ?l?d?s?u -m 1400 7777777777777777777777777777777777777777777777777777777777777777 ?1?1?1?1?1?1?1?1


To be fair, here are some disadvantages:
  • ATI 4xxx lack support for constant memory on opencl. Therefore you will not get full performance.
  • NVidia sm_21 cards require special handling to get full performance. I may add these to a later version.
  • Not possible to restore a session. However you can pause / resume it on the fly.
  • Supports Multi-GPU yes, but not in "mixed" configurations. For example 9600gt and gtx480.

Keep in mind, this is just a hack. I am not sure if i will add these special optimized kernels to oclHashcat distribution.

But the results were cool so i wanted to share it with you guys. Have fun with it.

Here is it: http://hashcat.net/files/oclHashcat-0.26b.7z
#2
This is awesome Smile all I gotta say is hashcat rules!!
#3
Now support HEX-SALT and HEX-CHARS ?

Thanks
#4
yes Smile
#5
Hello Atom,

I just completed the download. I will post some results within the hour. Pretty big and exciting news that you delivered today.

SHA-256 is new as well;1.2M/S on an HD5970 is not bad at all. I will post results for a few algo's and get back to you.

Thank you


UPDATE: (Running x2 ATI HD5970's-stock clocks)

oclHashcat v0.26 *** BETA *** starting...

Digests: 1 entries, 1 unique
Bitmaps: 8 bits, 256 entries, 0x000000ff mask, 1024 bytes
NOTE: brute-force detected, switching to warp-mode
Platform: ATI compatible platform found
Device #1: Cypress, 512MB, 0Mhz, 20MCU
Device #2: Cypress, 512MB, 0Mhz, 20MCU
Device #3: Cypress, 512MB, 0Mhz, 20MCU
Device #4: Cypress, 512MB, 0Mhz, 20MCU
Device #1: Kernel ./kernels/4098/m0100q_warp.Cypress.64.kernel
Device #2: Kernel ./kernels/4098/m0100q_warp.Cypress.64.kernel
Device #3: Kernel ./kernels/4098/m0100q_warp.Cypress.64.kernel
Device #4: Kernel ./kernels/4098/m0100q_warp.Cypress.64.kernel

MD4 = Speed.GPU*: 30.5G/s
NTLM =Speed.GPU*: 29.5G/s
MD5 =Speed.GPU*: 19.2G/s
md5(md5(4pass))=Speed.GPU*: 4663.7M/s
SHA1 =Speed.GPU*: 5775.7M/s
MySQL=Speed.GPU*: 91.7G/s
MySQL4.1/5Speed.GPU*: 2659.6M/s
SHA256=Speed.GPU*: 2492.5M/s
#6
Single 580:

MD4 Speed.GPU*: 3342.4M/s
NTLM Speed.GPU*: 3273.8M/s
MD5 Speed.GPU*: 2245.3M/s
MD5(MD5) Speed.GPU*: 663.7M/s
SHA1 Speed.GPU*: 748.1M/s
MySQL Speed.GPU*: 11.8G/s
MySQL 4.1 Speed.GPU*: 354.9M/s
SHA256 Speed.GPU*: 314.0M/s
#7
I get about 8250m/s on 2 6870s for single md5, 4100m/s before this version. Slight overlock to 920mhz from 900
#8
awesome work, atom)

2x5970 - 16 g/s
#9
Wow, you are just so awesome atom!
#10
Hi,first great work,long awaited update from my needsWink
Can someone help me with the correct command line for sl3(salted sha1)?I can not seem to find the right one,bcs there is little info regarding new futueres in oclHashcat about salt and charset in hex!!!


10x for helping me in advance

PS: 2x5850 @975Mhz core
SHA1 speed from example from 1st post is :3470M/s(about 25% faster than golubev ighashgpu,simple impressiveWink)

EDIT:Huh I got it 10x to atom,but mode 101 is not supported in this beta ..... waiting for atomWink again.. and another one:how to use password mask for masking HEX characters?