oclHashcat-plus v0.09
#1
We are proud to present oclHashcat-plus v0.09!

Download it here: http://hashcat.net/oclhashcat-plus/

Lots of new features and algorithms have been added, and many bugs have been fixed.

The major changes are:
  • Support for cracking the bcrypt and sha512crypt ($6$) algorithms.
  • Support for GPU clustering across multiple LAN hosts via VCL, and an increase to support 128 GPUs.
  • Added what we call a Brute-Force++ attack (see details for description).
  • Increased cracking performance, especially on multi-hash due to partially reversing as you know it from single-hash cracking.



Lets start with the algorithms added; in this case, the generic types:
  • added -m 10 = md5(pass.salt)
  • added -m 20 = md5(salt.pass)
  • added -m 30 = md5(unicode(pass).salt)
  • added -m 40 = md5(salt.unicode(pass))
  • added -m 110 = sha1(pass.salt)
  • added -m 120 = sha1(salt.pass)
  • added -m 130 = sha1(unicode(pass).salt)
  • added -m 140 = sha1(salt.unicode(pass))
  • added -m 1410 = sha256(pass.salt)
  • added -m 1420 = sha256(salt.pass)
  • added -m 1710 = sha512(pass.salt)
  • added -m 1720 = sha512(salt.pass)

They have been added for two reasons.

1. Because there were many requests by users to add them like here:

2. By adding another feature -- that is, setting the minimum length for a salt to 0 -- you can construct your own hashing modes if you exploit the salt by putting some data into the calculation. Since we have support in oclHashcat-plus for --hex-salt, this will make your lives even easier.



Next one is the bcrypt algorithm.

Guys, there is not much to say. Just one thing: do not expect too much! This algorithm was designed to run extremly slow on GPUs. It is highly dependant on memory-lookups, and is both salted and iterated. On our hd6990, we can reach 4085/s. This isn't much, but it's still multiple times faster than on CPU.

Details here:



Another algorithm we added was the EPIserver algorithm. These are the hashes stored by the ASP.NET membership provider. For more detailed information about this, have a look here: http://hashcat.net/forum/thread-987.html

There are plans to rename this algorithm from EPIserver to something like "asp.net membership provider." For now we will stick to EPIserver, but we will certainly rename this in a later version.

There was already an interesting blog post about all this here, definitely a good read: http://www.troyhunt.com/2012/06/our-pass...othes.html




Last but at least, the most impressive addition is the sha512crypt algorithm, aka $6$, which is used in nearly all Linux distributions by default.

Like all crypt(3) algorithms, this is another algorithm which is designed to run slow; plus, it is based on sha512, which uses 64 bit integers. Today's AMD GPUs do not have support for native 64 bit bitwise arithmetics (except shifts), so this is another reason why this algorithm is slow.

Still, the speedup cracking sha512crypt on GPU versus CPU is much higher compared to bcrypt. My hd6990 gives an impressive 32519/s, which we are very proud of!

This algorithm was requested here:



The partial reversing of hashes for multi-hash lists differs a bit from classic single-hash reversal, which you are already familiar with if you use oclHashcat-lite. For several reasons, it is not efficient to reverse all hashes that many steps back as in single-hash cracking, and thus we can not reach oclHashcat-lite speed. But, it can still be more efficient than just traditional early checks.

To visualize this, here made some graphs:

[Image: plus89_mh18.png]

[Image: plus89_mh915.png]

You can see that the less hashes you have, the more efficient it is. The curves on Nvidia are a bit sharper.

Whenever you run brute force on multiple MD4, NTLM or MD5 hashes, oclHashcat-plus will use this partial reveral technique. In theory we can port this to salted hashes as well, but multi-hash on a salted hash is a bad idea. So for now, we stick to raw and reversable algorithms.



Another nice thing that came up lately is the Virtual OpenCL Cluster Platform (VCL) project. When thorsheim and epixoip informed us about this project in this post http://hashcat.net/forum/thread-1473.html it was totally not working with oclHashcat-*, nor any other OpenCL-based password cracker. But, we got in contact with the developers at MOSIX, and after some debugging and trace sessions, we were able to pinpoint the problems. MOSIX then released VCL version 1.15 which addressed these issues.

The overhead produced by the network agents is very low. This is one of the most important factors for a distributed solution. I made some stats on this here:

[Image: vcltable.png]


VCL is intended to be used on dedicated LANs or with High Speed Interconnects. I would not recommend clustering nodes over the Internet, as both latency and bandwidth would be an issue.

Development for VCL support is still in its infancy, but I've tested it with 22 GPUs and it worked well. Installing and configuring VCL is outside the scope of these release notes, but I plan to write a form post on this topic soon. However, there is no magic required to get VCL running on your own.

To better support VCL, we have increased the maximum number of GPUs from 16 to 128. We do not know for a fact if VCL can handle 128 GPUs, but it works with at least 22 GPUs.

Another nice thing about this is that it works around the 8-GPU limitation in AMD's drivers and Xorg. Since VCL does not require X to run, you can build giant GPU clusters this way.



Something that already was included in the newer versions of oclHashcate-lite is the support for markov-chains.

It does not matter if you do simple Brute-Force attack using -a 3 or you do a dictionary based Hybrid-Attack using either -a 6 or -a 7. This enhancement is automatically used EVERY time you use a mask.

A little background on this, as if you do not use oclHashcat-lite you might not know:

The markov-attack is a statistically based brute-force like attack, but instead of specifying a charset or a mask, we specify a file that was generated once in a previous step. It contains statistical information which is made out of an automated analysis of a given dictionary.

It can fully replace Brute-Force since it covers the full keyspace.

In Brute-Force Attack (or in Mask Attack) we can limit the keyspace by setting a smaller charset in order to reduce the attack-time. In Markov Attack we have something similar, the "threshold". All you do is to specify a number. The higher the number, the higher the threshold to add a new link between two characters on the two-level table on which the markov-attack is based on.

The background is not so important -- just remember that the lower the value, the smaller the keyspace, and thus the faster the attack is.

But if you take a close look on it, the technical correct description would be: "Brute-Force attack enhanced by per-position markov-chains built out of wordlists for statistics with the ability to use filters using a mask". OK? That required some special naming, and since it's 100% replacing Brute Force, we made it simple for ourselves and called it Brute-Force++

Here is a nice chart that visualizes the efficiency of Brute-Force++:

[Image: bfpp.png]

The original description of how this works can be found here:



Use .ptx ad .llvmir intermediate kernels - from oclHashcat-lite

The kernels are distributed in an "intermediate" format (aka IL). This format cannot be reversed to its original C code, but is still not a binary format that can be used for execution.

The JIT (just-in-time) compilers from both OpenCL and CUDA, which ship with the driver, compile the final bytecode out of the IL. This takes a few seconds per kernel, but this is a one-time operation as the bytecode is cached (CUDA does it automatically, OpenCL does not, but we add eda function that emulates CUDA's behavior.)

This has some nice advantages:
  • Not 32/64 bit specific
  • Less HDD space
  • Smaller .7z
  • Less problems with driver specific problems as we often see with Catalyst
  • There is no more need to release a new oclHashcat-* in case a new driver optimization has been added. Cached oclHashcat-* kernels are driver specific. If it recognizes a driver change, it will rebuild the bytecode from the IL, but using the new JIT from the new driver, resulting in driver-specific optimized bytecode.



Added Retaining GPU temperature - from oclHashcat-lite

When I started with oclHashcat-* Hardware mangement support, some people asked me for add support for fan-speed. For a long time I was not interessted in adding fan-speed code to oclHashcat-* since this is the job for the driver or some specialized controling software.

I did not change my mind completly on this, but still we have added some fan-speed controlling code. The new parameters are:

Code:
--gpu-temp-disable            Disable temperature and fanspeed readings and triggers
--gpu-temp-abort=NUM          Abort session if GPU temperature reaches NUM degrees celsius
--gpu-temp-retain=NUM         Try to retain GPU temperature at NUM degrees celsius (AMD only)

So what this does is, if the temperature configured with the new --gpu-temp-retain parameter is reached, it starts to increase the fan-speed by 1 percent each second. Thats all. In practice, this means is it enables you to enfore a very specific operating temperature for your GPUs.

Some notes:
  • --gpu-temp-disable you can completly disable all the temperature stuff.
  • --gpu-temp-retain currently only works for AMD.
  • --gpu-temp-abort parameter is just the renamed version of the old --gpu-watchdog.
  • Both parameters accept the 0 value which disables only this specific feature. This means you can step back to the old behavior by specifying --gpu-temp-retain 0.
  • The default for --gpu-temp-abort is still 90c.
  • The default for --gpu-temp-retain is 80c.



More implemented feature requestes on forum:

More implemented feature requestes on PM / IRC / Email:
  • Default-mask for -a 3 mode from oclHashcat-lite v0.10
  • Commandline switch --disable-potfile feature from hashcat v0.40



This new version has been tested by many beta testers on a wide variety of hardware and operating systems.

All new features were available to beta tester for several weeks. All we did for the last few weeks was perform both automated and manual tests of all features and algorithms, until all issues were 100% fixed.

We want to say a special thank-you to the following beta-testers for their massive support during development:

This is great proof of how the cracking community is working together, regardless of what team they are on.

Of course we want to say thanks to all the beta testers who helped finding bugs and suggesting things as well -- Thanks!

--
atom and matrix




Full changelog:

Code:
type: feature
file: kernels
desc: added -m 10 = md5(pass.salt)

type: feature
file: kernels
desc: added -m 20 = md5(salt.pass)

type: feature
file: kernels
desc: added -m 30 = md5(unicode(pass).salt)

type: feature
file: kernels
desc: added -m 40 = md5(salt.unicode(pass))

type: feature
file: kernels
desc: added -m 110 = sha1(pass.salt)

type: feature
file: kernels
desc: added -m 120 = sha1(salt.pass)

type: feature
file: kernels
desc: added -m 130 = sha1(unicode(pass).salt)

type: feature
file: kernels
desc: added -m 140 = sha1(salt.unicode(pass))

type: feature
file: kernels
desc: added -m 141 = EPiServer 6.x
cred: thorsheim

type: feature
file: kernels
desc: added -m 1410 = sha256(pass.salt)

type: feature
file: kernels
desc: added -m 1420 = sha256(salt.pass)

type: feature
file: kernels
desc: added -m 1710 = sha512(pass.salt)

type: feature
file: kernels
desc: added -m 1720 = sha512(salt.pass)

type: feature
file: kernels
desc: added -m 1800 = sha512crypt, SHA512(Unix)

type: feature
file: kernels
desc: added -m 3200 = bcrypt

type: feature
file: kernels
desc: removed -a 4 permutation attack (use rules and combinator-attack instead)

type: feature
file: kernels
desc: added reversing kernel for multihash MD5 if running in -a 3 mode and mask < length 9

type: feature
file: kernels
desc: added reversing kernel for multihash MD4 if running in -a 3 mode and mask < length 13

type: feature
file: kernels
desc: added reversing kernel for multihash NTLM if running in -a 3 mode and mask < length 9

type: feature
file: kernels
desc: on AMD, switched from .kernel to .llvmir to reduce diskspace

type: feature
file: kernels
desc: on NV, switched from .cubin to .ptx to reduce diskspace

type: feature
file: kernels
desc: added kernel cache to avoid unnecessary recompilation
cred: m4tr1x

type: feature
file: kernels
desc: brought back support for AMD hd4xxx GPUS due to .llvmir integration

type: feature
file: kernels
desc: optimized 0x80 handling; +3.6% speed in combinator- and hybrid-attack

type: feature
file: host programs
desc: added support for Virtual OpenCL (VCL) Cluster Platform VCL 1.15
cred: epixoip

type: feature
file: host programs
desc: added support for up to 128 GPUS

type: feature
file: host programs
desc: ported markov-attack from oclHashcat-lite v0.10

type: feature
file: host programs
desc: ported increment-mode from oclHashcat-lite v0.10

type: feature
file: host programs
desc: ported default-mask from oclHashcat-lite v0.10

type: feature
file: host programs
desc: ported -j and -k single rules from oclHashcat v0.27

type: feature
file: host programs
desc: allowed zero-length salts in the generic algorithms makes it more easy to exploit them

type: feature
file: host programs
desc: added next-dictionary-in-line feature to skip inefficient dictionaries on keypress

type: feature
file: host programs
desc: implemented base64 parser that would allow for dynamic salt lengths in nsldaps

type: feature
file: host programs
desc: worked around memory allocation limit, you can load twice as much hashes in multihash

type: driver
file: kernels
desc: added support for NVidia CUDA 5.0

type: driver
file: kernels
desc: added support for AMD APP SDK v2.7

type: driver
file: host programs
desc: added support for NVidia NVML library and got rid of nvidia-smi command

type: feature
file: host programs
desc: splitted --gpu-watchdog to --gpu-temp-disable and --gpu-temp-abort

type: feature
file: host programs
desc: added --gpu-temp-retain to try retain temperature at NUM degrees celsius
cred: m4tr1x

type: feature
file: host programs
desc: worked around AMD bug in clGetDeviceInfo() CL_DEVICE_MAX_CLOCK_FREQUENCY
cred: m4tr1x

type: change
file: host program
desc: updated exit status code, see status_codes.txt for details
cred: m4tr1x

type: feature
file: host programs
desc: backported --disable-potfile feature from hashcat v0.41
cred: m4tr1x

type: feature
file: host programs
desc: add ?a to built-in charsets as ?l?u?d?s
cred: m4tr1x

type: feature
file: host programs
desc: added fan-speeds to status display

type: bug
file: host programs
desc: fixed a bug in host program for WPA/WPA2 in -a 1, -a 6 and -a 7 mode
cred: bjorn

type: bug
file: kernels
desc: fixed a bug in kernel for WPA/WPA2 on AMD VLIW architecture leading to code not found
cred: DrGeek

type: change
file: contact.txt
desc: updated contact information (moved to freenode IRC)
#2
Great work! Good job all!
#3
fucking amazing! this would qualify for a major release!
sch0.org
#4
fantastic work, everyone!
#5
As said in the release notes, here is the howto:

Building GPU-Clusters for oclHashcat with VCL v1.15: https://hashcat.net/wiki/doku.php?id=vcl_cluster_howto
#6
(09-08-2012, 04:15 PM)atom Wrote: As said in the release notes, here is the howto:

Building GPU-Clusters for oclHashcat with VCL v1.15: https://hashcat.net/wiki/doku.php?id=vcl_cluster_howto
Good wiki. It's not mentioned but I guess that you are bound with the same limitation as the OCL version which is that you need the same cards on each machine or at least the cards using the same kernel, right?
#7
Yes, right, while my prio 1 is to enable mixed gpu types for v0.10 Smile
#8
Thanks ! great release as always !
#9
atom, just wanted to clarify. is the master node required to be on the same highspeed LAN or can it be on wireless?
#10
Wireless LAN is a highspeed LAN, somewhat Smile

Should work, yes!