Implement correct toggle of non-default charsets chars like german Umlaute
> Well, it depends on the encoding whether they are multibyte or not, doesn't it?
Yes but most wordlists are in UTF-8, it would be very strange for a wordlist to store Ä as a single byte / extended ASCII and because Hashcat can't tell what the wordlist's encoding is taking as input, so if it was UTF-8 as almost all wordlists are, it would require a multibyte replace, which isn't possible. If your wordlist did happen to contain Ä as a single byte, then it would be possible, just replacing hexadecimal e4 with c4, like this: "s\xe4\xc4".

> UTF8 and UTF16 use 2 bytes for the Umlaute. If I understand you correctly, rules do not allow something like "if byte n is x and byte n+1 is y replace x by a and y by b" at all, right ?
Correct, that isn't supported by Hashcat and likely won't for a long time, due to speed and structure of the rule processor

> [a github issue states] that the utf16 encoding which is part of ntlm hashing does not work realiably for non-ascii chars.
That's correct but only for the optimised kernels, when you use the "-O" argument and only when you use NTLM (-m 1000). No other popular kernel has this same issue, it's just because converting to UTF-16LE is extremely slow and would slow down Hashcat greatly so it's left to the pure kernel instead of the speed-focused, inaccurate optimised kernel. I just generalised and said it will be hashed correctly, due to the fact that Hashcat supports over 500 hash modes and only the very small few that involve both UTF16-LE and optimised kernels, is this a problem. Also, in your single-byte scenario, even optimised NTLM would work just fine, it's only multibyte characters that fail. Royce Williams has already made a github issue for this, to better communicate to the end user, the limitations of any kernel, which this would very much use. Sadly, it's not yet been implemented

Also, just so we're both on the same page, Ä can hash into different hashes, depending on the encoding so a MD5 hash using UTF-8 (\xc3 \x84) is:
While the Extended ASCII version is (\xc4):
UTF-8 is by far the most common encoding you'll see when dealing with hashes. Hashcat can crack them both (well, except optimised NTLM), as it's just about getting the bytes right:
> ./hashcat -m 0 ffd0ce5d00f597aa4f0d2bb60d17b6bc -a 3 --hex-charset c4 --quiet --potfile-disable
> ./hashcat -m 0 b66491b03046f0846fe4206bc6a0f3c0 -a 3 --hex-charset c384 --quiet --potfile-disable

Messages In This Thread
RE: Implement correct toggle of non-default charsets chars like german Umlaute - by penguinkeeper - 09-01-2024, 12:57 AM