16-bit unicode?
This SW looks great, exactly what I wanted, except one problem. I'm trying to crack hashes that were hashed in 16 bit Unicode. Is there anyway to make this work in hashcat?
no, because it would make dictionary parser slower. it is easier and faster to parse normal 7bit/8bit dictionary and then expand the word in memory to utf. but this requires special -m mode (like NTLM to MD4). so in case you want to crack NTLM just use -m 1000. otherwise you cant crack them.
So it just doesn't support cracking sha-1 if the pass was hashed in unicode? That's kind of disappointing, I know some other SW does. Next version perhaps?

I tried making dictionary files that had the words in Unicode with normal 8-bit line returns, hoping it would just take the raw data between each line return and use that in the hash, but apparently not. I guess it makes sense that it would try to parse it as 8-bit characters for processing rules and such.
Even theoretically possible, to bruteforce passwords that consists of a character map with 2^16 chars (that is UCS-2, UTF-16 can take more than two bytes per code-point as it is not fixed-length) is extending the time extremely. So UTF-16 must be expanded to UTF-32 first when you want to deal with this strictly. I have no proof for this, but I just assume that the trade-off for a hashing algo would be too immense to deal with the actual character encoding as well instead of just crunching bytes in memory at the expected, non-variant position. So if you say UTF-16 we might need to talk about UCS-2 and/or UTF-32 memory wise. Let's imagine what this means compared to normal ASCII or LATIN-1 passwords. For a password that consists of four characters (only, probably in Chinese this is much, but I have no idea). A simplified view could be:

4 character UCS-2  password =  8 character ASCII/LATIN-1 password
4 character UTF-32 password = 16 character ASCII/LATIN-1 password

You might now think that's possible to bruteforce a 8 character ASCII password. But keep in mind that while bruteforcing, you only take some character, like 0-9, a-z and A-Z (62 possibilities). For a multi-byte charset you most likely want to look for more chars per code-point (most likely more than 3 844 (62^2) possibilities per code point in a 16-Bit charset).

I don't think that's something someone is really after as it does not scale well with bruteforcing. I can imagine cases where this makes sense though, but most likely, this is "too much" for bruteforce.

But the number of possible combinations which plays a if not the foremost role in bruteforcing is not everything to consider for character encoding while cracking passwords. Hashcat supports dictionary and rules as well, so this is not bruteforcing only.

Dictionary Attacks

For dictionary attacks, this would mean the cracking app needs to support multi-byte charset dictionaries as well. This is because it makes no sense to crack multi-byte char hashes with a single-byte char dictionary, because the dictionary consists only of a fraction of words.

The only exception I see here is for UTF-8 but that's only because UTF-8 has ASCII-7bit as subset. And it's the only one I know of so far.

Rule Files and Interpreter

And then the rules. The current rule engine is implemented to work on single byte characters. The rule interpreter as well must support multi-byte charsets to properly work. That would need one additional version for two- and one for four-byte charsets.

Probably the rule language must be even adopted to support such charsets.

Attack Strategy: Single-byte encoded as Multi-byte

As atom already proposed, there is an exception to all this. While some systems store hashes for passwords that are multi-byte charset encoded, most users enter the password via their western keyboards so most likely the passwords consist of characters that can be easily represented with a single byte character set like ASCII or LATIN-1.

So to attack those even multi-byte encoded passwords, one option is to convert the multi-byte charsets into single-byte charsets first and then single-byte attack them while maintaining the byte/charset relationship in the hashing algorithm. To maintain the relationship is necessary otherwise the algorithm won't find the hashes properly. So this method has an overhead but the overhead is most likely smaller then having everything (hashes, hash-algos, dictionaries and rules) in a multi-byte variant.

I assume that's what atom already was talking about. That means a specific mode not only for the algorithm but also the variant of it for a specific charset. I think this method is efficient but it has the trade-off not being able to crack multi-byte character passwords that consist of characters not covered by single-byte charsets / dictionaries.

My 2 cents.

-- hakre