The case for $HEX[]
Since I (re) started working on hashes, I've been trying to find better ways to represent my "found" hashes, so that they can be re-used. One of the first questions I posed was "how do you represent passwords with embedded CR/LF/NUL characters?" (or other control characters).

In hashcat, the only way that seems to be available is to use hex output; but this then means that you need to decide, in advance, which passwords might contain control characters, and output them to a separate filename. In certain cases (like a CR or LF at the beginning or end of a password), you can use hex-salt, and one of -m 10 or -m 20, but this is a real pain.

A few months ago, I became frustrated enough to write some software for this, and implement it myself. As a result, I invented a new way to represent passwords that contain control characters. For "regular" passwords, no change is made. "password" still looks like "password" in my dictionary files. But if the password contains a control character (other than space or tab, of course), I "switch modes", and output the hex equivalent, enclosed in $HEX[].

So, a password of "foobar1[CRLF]" becomes:


This has the added benefit of being able to use _any_ character in a passwords. Most hash algorithms (SHA1/MD5, etc) work on the basis of bytes in any event, so creating an arbitrary-length string in a dictionary file, of any set of characters, becomes really easy.

But is it really needed?

In the isw 2012 challenge of 139,444,502 hashes, I was able to find 127,224,531 solutions. more than 260,000 required $HEX - for example:


and many more.

Adding rules to prepend/postpend CR/CRLF/NUL etc is an easy way to find these, but without a regularized output format, it's impossible to re-use the data - this is particularly the case with passwords containing one or more CR/LF characters.

But can't you just use --hex-output?

Again, yes, this is an option, but to effectively re-use the dictionaries, I would have to convert all of my existing dictionaries to hex-output format. This would double the size of the dictionaries, and increase the loading time.

But having to parse the $HEX[] format means that loading passwords would be slow!

That's not the case. Note: This is not a plug for my software; it's not available. I'm using this as an illustration only of time required to parse the input.

To check 258M passwords in 9 file against 2,894,100 MD5 hashes takes hashcat (without $HEX[] parsing)

real 1m46.145s

Using oclHashcat:

real 3m6.574s

Using mdxfind (my code, including the $HEX[] parsing).

real 1m8.213s

(so about 30% faster, doing the $HEX[] parsing. My system had a load of 30 while running this; it wasn't idle).

So $HEX isn't slowing anything down. It just is a better way to represent passwords that contain unprintable characters.

If you have a better idea, please let me know. I've been struggling for quite some time with this, and have yet heard no better plans.

my comment is that this is a great idea.

Currently have been using the same technique as described by Waffle above when storing hashes
with control characters and/or words from foreign language with non UTF-8 charcters.