oclHashcat-plus v0.15
#1
This version is the result of over 6 months of work, having modified 618,473 total lines of source code.

Before we go into the details of the changes, here's a quick summary of the major changes:
  • Added support for cracking passwords longer than 15 characters
  • Added support for mask-files, which enables password policy-specific candidate generation using PACK
  • Added support for multiple dictionaries in attack modes other than straight mode
  • Rewrote workload dispatcher from scratch
  • Rewrote restore support from scratch
  • Rewrote kernel scheduler to reduce screen lags
  • Better handling of smaller workloads/dictionaries
  • Language-specific charset presets for use with masks

New supported algorithms:
  • TrueCrypt 5.0+
  • 1Password
  • Lastpass
  • OpenLDAP {SSHA512}
  • AIX {SMD5} and {SSHA*}
  • SHA256(Unix) aka sha256crypt
  • MacOSX v10.8
  • Microsoft SQL Server 2012
  • Microsoft EPi Server v4+
  • Samsung Android Password/PIN
  • GRUB2
  • RipeMD160, Whirlpool, sha256-unicode, sha512-unicode, ...

New supported GPUs:

NVidia:
  • All sm_35-based GPUs
  • GTX Titan
  • GTX 7xx
  • Tesla K20

AMD:
  • All Caicos, Oland, Bonaire, Kalindi and Hainan -based GPU/APU
  • hd77xx
  • hd8xxx

And last but not least, lots of bugs have been fixed. For a full list, see the changelog below.



Passwords longer than 15 characters

This was by far one of the most requested features. We resisted adding this "feature", as it would force us to remove several optimizations, resulting in a decrease in performance for the fast hashes. The actual performance loss depends on several factors (GPU, attack mode, etc), but typically averages around 15%.

Adding support for passwords longer than 15 characters required removing the following optimizations for fast hashes:

  1. Zero-based optimizations: Many algorithms can be optimized based on the fact that zero values in arithmetic or logic operations do not change the input value. With a password limit of less than 16 characters, it was guaranteed that values for positions 16-63 were zero, allowing us to omit dozens of operations from each step. These optimizations can no longer be used. I explain this in a bit more detail in my Passwords13 presentation: http://hashcat.net/p13/js-ocohaaaa.pdf

  2. Register pressure: If a password is 15 characters or less, we only need four 32-bit registers to hold it. But as we increase the length, more registers are required. This can slow down the entire process if more registers are required than accessible, as the compiler has to swap out data to global memory. Supporting passwords longer than 15 characters increases register pressure.

  3. Wordlist caching: oclHashcat-plus handles wordlist data in a very unique way. The words are not simply pushed to GPU from start to finish; they are first sorted by length. This is done because many algorithms can be further optimized when all of the input data is the same length. oclHashcat-plus v0.14 attempted to cache all words of a specific length until it hit a threshold, and then flushed the cache. This required a lot of host memory. Depending on the number of GPUs we have and the specified -n value, oclHashcat-plus easily allocated 16GB of host memory, and even more for large VCL clusters. This buffer would have been increased 4x since we wanted to increase from a maximum length of 16 to a maximum length of 64. In other words, our host system would request 64GB of RAM!

We've spent a lot of time focusing on these issues, trying to minimize the impact of each. The good news is that we were able to solve #2 and #3. But, because there is no solution for #1, there will still be some performance loss in some cases.

The precise decrease depends on GPU type, algorithm, number of hashes, well... just about everything! But, we also managed to increase performance for some algorithms, too. In order to solve #2, we rewrote nearly all candidate-generating code, and found some sections that could be optimized. In other words, the solution to #2 was to optimize the code in special sections, and then use the resulting performance increase to compensate against the performance decrease. Also, some of the algorithms do not require a lot of registers. All in all, the losses are less than expected.

To solve #3, we had to rewrite the workload dispatcher from scratch. There's a special subsection about that topic below, since it contains some other important information as well. As a positive sideeffect of this solution, the host memory requirements are now less than those of v0.14!

The slow hash types will not drop in speed, except for phpass and md5crypt:
  • phpass, MD5(Wordpress), MD5(phpBB3)
  • md5crypt, MD5(Unix), FreeBSD MD5, Cisco-IOS MD5
  • md5apr1, MD5(APR), Apache MD5
  • sha512crypt, SHA512(Unix)
  • Domain Cached Credentials2, mscash2
  • WPA/WPA2
  • bcrypt, Blowfish(OpenBSD)
  • Password Safe SHA-256
  • TrueCrypt
  • 1Password
  • Lastpass
  • sha256crypt, SHA256(Unix)

If you're curious about real-world performance losses and gains between v0.14 and v0.15, we've compiled a table that compares the two versions for each architecture:


[Image: speed_v14_to_v15.png]


This table shows single-hash MD5, which has been choosen since it's a very sensible hash-type. Changes to this hash-type typically reflect onto other hashes as well.

Note: Because we've had to make a lot of deep changes to the core of hashcat, there can be no "old" method for < 16 character support as many people already suggested. The changes we made are to deep and they are no longer compatible to old-style kernels, and we really don't want to maintain two different tools.

One last note about performance. There was a change to the status-display of the speed value which does not affect the real performance. With new oclHashcat-plus v0.15 the speed that is shown gets divided by the number of uncracked unique salts. Older oclHashcat-plus versions did not take this into account. Don't get shocked when you're cracking a large salted hashlist and the speed dropped by hundret of times (or to be exact by number of hashes/salts), the total time will stay equal.



Essential performance note

We can help hashcat compensate for the performance decrease by preparing our dictionaries!

Get hashcat-utils, and run your dictionaries through splitlen. Using rockyou.txt as an example:

Quote:
$ mkdir foo
$ ./splitlen.bin foo < rockyou.txt
$ cat foo/* > rockyou-sorted.txt

The example works on windows, too.

This typically gives 15% speed boost, because we get much higher cache hits. This is essential, especially for VLIW4 and VLIW5 systems.



Passwords less than 55 characters

Adding support for passwords longer than 15 characters does not completely remove the length limitations in hashcat. Generally speaking, the new maximum length is 55 characters, except in the following cases:

For slow hashes:
  • mode 400 is limited to 40 characters
  • mode 5200 is limited to 24 characters
  • modes 500, 1600, 1800, 3200, 5300, 6300 and 7400 are limited to 16 characters
NOTE: We're planing to remove the limits for these modes in the next version.

For fast hashes, the important factor is the attack mode:
  • attack-mode 0, the maximum length is 31
  • attack-mode 1, the maximum size of the words of both dictionaries is 31
  • attack-mode 6 and 7, the maximum size of the words of the dictionary is 31

Just to make this clear: We can crack passwords up to length 55, but in case we're doing a combinator attack, the words from both dictionaries can not be longer than 31 characters. But if the word from the left dictionary has the length 24 and the word from the right dictionary is 28, it will be cracked, because together they have length 52.

Also note that algorithms based on unicode, from plaintext view, only support a maximum of 27. This is because unicode uses two bytes per character, making it 27 * 2 = 54.



TrueCrypt 5.0+

Finally, we have a TrueCrypt cracking mode that fully computes on GPU -- everything is written 100% on GPU! There is no copy overhead to the host, thus our CPU will stay cool at 0% and we can do whatever we want to do with it.

Current implementation is able to crack the following supported hashes:
  • RipeMD160
  • SHA512
  • Whirlpool

For the ciphers, we're currently doing only AES, and of course the XTS block-cipher. Serpent and Twofish -- and their cascaded modes -- will be added next.

Here are some speeds from 2x hd6990:
  • PBKDF2-HMAC-RipeMD160 / AES: 223 kHash/s
  • PBKDF2-HMAC-SHA512 / AES: 95 kHash/s
  • PBKDF2-HMAC-Whirlpool / AES: 49 kHash/s *updated*
  • PBKDF2-HMAC-RipeMD160 boot-mode / AES: 451 kHash/s

These tests show oclHashcat-plus is world's fastest TrueCrypt cracker!! :)

Here's the original article: http://hashcat.net/forum/thread-2301.html





Mask files

Mask files are very easy to understand: they are just plain text file with one mask per line. Supply the filename instead of a mask on the command line, and hashcat will automatically loop through the masks in the file -- see some examples here: http://hashcat.net/wiki/doku.php?id=mask...mask_files

Until now to use a list of masks, we would need to manually iterate (or using some self-made wrapper scripts) through the set of masks, fully restarting hashcat on each loop iteration. It worked, but we suffered from the startup and shutdown times for each invocation. Storing those masks into a hcmask file and using this new feature is much faster, especially when we have lots of small masks.

The coolest thing though is what iPhelix from Team Hashcat made out of this feature...

If we're a pentester, and we're to audit an AD domain, we typically face a password policy. Password policies aren't always very clever; most of the time, they force users to select passwords with predictable patterns (you can read more about this topic in Minga's Passwords13 talk). Using a specific set of masks we can avoid candidate passwords that do not match the policy, thus reducing the keyspace efficiently. There are a few ways to do this. For example, we could pre-check our password candidates against a policy, which requires some branching, and we all know GPUs are bad when it comes to branching.

Therefore, it's far more clever to simply never generate those candidates which do not match the password complexity policy, and that's where this new feature comes in. iPhelix, the autor of the PACK, wrote a tool to automatically create mask files for us based on a password policy. He also added some preset mask files for default AD password policies! I strongly recommend you take a look at PACK: http://thesprawl.org/projects/pack/

With Hashcat, PACK, and mask file support, we now have a completely unique and powerful feature that no other password cracking program supports!





Directories containing dictionaries

You already know that with straight mode, you can specify a directory directly on the command line, and hashcat will loop through all of the dictionaries in that directory automatically.

Now you can do this in hybrid modes, too!

The same is true for using multiple dictionaries on command line. For hybrid-mode, the mask is always just a single argument. That means that for instance for -a 6, all arguments except the last one can be dictionaries.





Rewrote workload dispatcher


This was required to solve problem #3 from the password length increase, but it quickly turned out to add some unexpected benefits. One benefit is the way smaller workloads are handled.

A problem that every one of us ran into was that, with small workloads, oclHashcat-plus did not fully utilize the power of all GPUs. To fully utilize the GPUs, there need to be enough different input data for the calculation. For example, when oclHashcat-plus displays that it is cracking at a rate of 8.3 GH/s, the program actually requires us to feed it with 8,300,000,000 different words per second. We get that high of a number because of the attack modes. Our dictionary does not need to have 8,300,000,000 different words, but if we were to run it in -a 0 mode with 40,000 rules, we still need to feed it with 207,500 words from our dictionary per second (207,500 * 40,000 = 8,300,000,000). Well actually that's not entirely correct. We still need to invoke the GPU kernel and compute the hashes on it and this tasks takes 99.9% of the time. So to load this stuff, we actually have just a few miliseconds otherwise our cracking speed will drop since the GPU is idleing.

The typical process is that our host program parses the dictionary, then copies it to the GPU, then runs the kernel and collects the results. Now, because of the old caching structure, oclHashcat-plus was a bit more complicated than this. It was not loading 207,500 words from our dictionary per second -- it was loading much more. And then it was sorting the words by length into specific buffers. Once a buffer's threshold was reached, it pushed the buffer to GPU, processed it, and collected the results.

The typical dictionary does not contain only words of one specific length (unless we run it through splitlen.bin from hashcat-utils). Typically, most of the words range from six to ten characters. We know these graphs, right? Let's say the distribution for all words of length 6 - 10 is all the same (which it isn't really, but it's easier to explain this way), then only every 5th word has a length of 8 characters. In other words, to feed oclHashcat-plus fast enough, the host program needed to parse 5 * 207,500 words from our dictionary per second.

Now think of a small dictionary, like milworm with 80k or what it has. We cannot even feed the 207,500 words required with it. And it's actually much worse, since the 80k has to be divided by 5 because of the above reason. So, we only provided oclHashcat-plus with 16k words from our dictionary per second, not with 207k. So the GPU wasn't able to exceed ~7% utilization. And then, we still have some small number for length 22 or 43 left that need to be checked... what a waste of time! This is typically what we saw at the end of each run.

With oclHashcat-plus v0.15, this structure has radically changed. There is still some caching, but it's totally different. We don't want to go to much into detail, but typically the divisor is now just 2. We still can not fully utilize all power with a tiny dictionary, but the requirement from huge dictionaries is much less.





Thanks

Many thanks to our beta testers, and everyone reporting bugs on the hashcat TRAC system. This is really pushing the project forward, and we greatly appreciate your effort. Please continue to support us!

We also want to thank KoreLogic for the "Crack Me If You Can" contest and the organizers from PHD's "Hashrunner". These contests give us a good view on what a typical pentester / IT-forensic needs and shows a direction to go.




Full Changelog

Quote:
* changes v0.14 -> v0.15:

type: driver
file: kernels
desc: added support for AMD Catalyst v13.4 (new GPU architectures: Bonaire, Kalindi, Hainan)
trac: #127

type: driver
file: kernels
desc: added support for AMD Catalyst v13.6 beta

type: driver
file: kernels
desc: added support for AMD Catalyst v13.8 beta

type: driver
file: kernels
desc: added support for NVidia ForceWare 319.37

type: driver
file: kernels
desc: added support for NVidia CUDA 5.5

type: feature
file: kernels
desc: added support for password length up to 55

type: feature
file: kernels
desc: allow variable salt length (length: 16 - 48) for aix hashes -m 6400, 6500, 6700

type: feature
file: kernels
desc: variable number of iterations for hash modes 500, 1600, 1800, 7400 (using e.g. $rounds=5000$)

type: feature
file: kernels
desc: added mode -m 1430 = sha256(unicode($pass).$salt)

type: feature
file: kernels
desc: added mode -m 1440 = sha256($salt.unicode($pass))

type: feature
file: kernels
desc: added mode -m 1441 = EPiServer 6.x > v4

type: feature
file: kernels
desc: added mode -m 1711 = SSHA-512(Base64), LDAP {SSHA512}

type: feature
file: kernels
desc: added mode -m 1730 = sha512(unicode($pass).$salt)

type: feature
file: kernels
desc: added mode -m 1731 = MSSQL(2012)

type: feature
file: kernels
desc: added mode -m 1740 = sha512($salt.unicode($pass))

type: feature
file: kernels
desc: added mode -m 5800 = Samsung Android Password/PIN
cred: Bjoern Kerler

type: feature
file: kernels
desc: added mode -m 6000 = RipeMD160

type: feature
file: kernels
desc: added mode -m 6100 = Whirlpool

type: feature
file: kernels
desc: added mode -m 6200 = TrueCrypt 5.0+

type: feature
file: kernels
desc: added mode -m 6300 = AIX {smd5}

type: feature
file: kernels
desc: added mode -m 6400 = AIX {ssha256}

type: feature
file: kernels
desc: added mode -m 6500 = AIX {ssha512}

type: feature
file: kernels
desc: added mode -m 6600 = 1Password

type: feature
file: kernels
desc: added mode -m 6700 = AIX {ssha1}

type: feature
file: kernels
desc: added mode -m 6800 = Lastpass

type: feature
file: kernels
desc: added mode -m 7100 = OS X v10.8

type: feature
file: kernels
desc: added mode -m 7200 = GRUB 2

type: feature
file: kernels
desc: added mode -m 7400 = sha256crypt, SHA256(Unix)

type: feature
file: kernels
desc: moved word-generator out of main kernels, as seen in oclHashcat-lite, reduces diskspace

type: feature
file: kernels
desc: changed the E rule to lowercase all input before processing, its more intuitive
trac: #110

type: bug
file: kernels
desc: rule * worked only for adjacent characters
trac: #110

type: bug
file: kernels
desc: fixed support for new GPU architectures: Oland
trac: #128

type: bug
file: kernels
desc: fixed a bug in Half MD5, in multi-hash mode it only found the first hash
trac: #136

type: feature
file: host programs
desc: added new rule function (JtR compatible): M - memorize the word (for use with "Q", "X", "4" and "6")

type: feature
file: host programs
desc: added new rule function (JtR compatible): Q - query the memory and reject the word unless it has changed

type: feature
file: host programs
desc: added new rule function (JtR compatible): X - extract substring NM from memory and insert into current word at I

type: feature
file: host programs
desc: added new rule function: 4 - appends word from memory to current word

type: feature
file: host programs
desc: added new rule function: 6 - prepends word from memory to current word

type: feature
file: host programs
desc: add rule file rules/rockyou-30000.rule based on PACK output
cred: iphelix

type: feature
file: host programs
desc: the .restore file now remembers also the last processed position in maskfiles

type: feature
file: host programs
desc: removed special VCL binaries - Not longer required since VCL 1.21 works transparently

type: feature
file: host programs
desc: added support for Tesla Deployment Kit v2.295.2

type: feature
file: host programs
desc: added support for NVAPI R313

type: feature
file: host programs
desc: add presets for .hcmask files based on PACK output. One policy covering for default AD scheme and one best-of-in-time-x based on rockyou

type: feature
file: host programs
desc: dropped predefined charsets ?h, ?F, ?G and ?R
trac: #55

type: feature
file: host programs
desc: added a collection of language-specific charset-files for use with masks
trac: #55

type: feature
file: host programs
desc: allow users to specify multiple dicts and/or directories containing dicts
trac: #71

type: feature
file: host programs
desc: support added to allow salts of length 1-31 for mode 111 = nsldaps, SSHA-1(Base64), Netscape LDAP SSHA
trac: #78

type: feature
file: host programs
desc: allow users to add masks using a file containing one or multiple masks, one per line, called .hcmask files
trac: #79

type: feature
file: host programs
desc: Leave username in hash file when using --remove together w/ --username
trac: #116

type: feature
file: host programs
desc: Format output according to --outfile-format option when using --show
trac: #117

type: feature
file: host programs
desc: changed units of measurement in terms of H/s, KH/s, MH/s, etc...
trac: #130

type: feature
file: host programs
desc: do not allow loading of duplicate rules (like hashcat)
trac: #132

type: feature
file: host programs
desc: removed rules/perfect.rule
trac: #132

type: feature
file: host programs
desc: reject -r option when not using straight attack -a 0
trac: #169

type: feature
file: host programs
desc: show mask length in status screen
trac: #180

type: bug
file: host programs
desc: Runtime timer (--runtime) delayed the program exit when using a huge list of hashes
trac: #93

type: bug
file: host programs
desc: fixed a bug in NetNTLMv2 parser, did not took client challenge into sort for unique
trac: #106

type: bug
file: host programs
desc: rare triggered crash when file opening/writing fails
trac: #111

type: bug
file: host programs
desc: fixed a bug in NetNTLMv2 parser, only cracking first hash in file
trac: #114

type: bug
file: host programs
desc: fixed a bug in NetNTLMv2 parser, using --remove option replaces all the old users
trac: #115

type: bug
file: host programs
desc: ensure that DES output does not contain plains with lengths greater than 8

type: bug
file: host programs
desc: don't force to update .restore too often e.g. when using -i and/or maskfiles

type: bug
file: host programs
desc: fixed output of show/left when there are colons in the plaintext part (in hashcat.pot)

type: change
file: host programs
desc: updated default values for --gpu-accel and --gpu-loops for NVidia
cred: Rolf

type: change
file: host programs
desc: changed speed display, now divides speed by number of uncracked unique salts

type: change
file: host programs
desc: updated --help text regarding --username
cred: #121

type: feature
file: rules
desc: added a more more complex leetspeak rule from unix-ninja
trac: #112

type: feature
file: rules
desc: added a more more complex leetspeak rule from Incisive
trac: #119

--
atom
#2
at last, thanks atom Smile
#3
Good job, may the cat be with all of you.
#4
Thank you very much for your efforts atom.
Amazing job.
#5
Absolutely amazing.

Nobody can imagine the time behind each optimization or new feature.

Thanks.
#6
Congrats another milestone. ;-)

How does does hashcat work with 1Password algorithm. The same as jtr method?

http://pastebin.com/mNmNsxZW
#7
Wow, thx a bunch atom!!!
#8
great write up. thank you for all the hard work
#9
Amazing ! Thank you for this great release ! As a coder, I really understand what were the past 6 months like and I really appreciate your work.

But I'm still sad for a little bit. My dear HMAC-SHA256 is still not implemented as I requested on Trac Sad I hope it will be implemented in the next release...
#10
Amazing release. Even passphrases are not safe anymore. A quick note for amd/ati users: Don't use catalyst 13.8 beta2, it's been reported to have some problems. You need 13.8 beta (the first released).