> Well, it depends on the encoding whether they are multibyte or not, doesn't it?
Yes but most wordlists are in UTF-8, it would be very strange for a wordlist to store Ä as a single byte / extended ASCII and because Hashcat can't tell what the wordlist's encoding is taking as input, so if it was UTF-8 as almost all wordlists are, it would require a multibyte replace, which isn't possible. If your wordlist did happen to contain Ä as a single byte, then it would be possible, just replacing hexadecimal e4 with c4, like this: "s\xe4\xc4".
> UTF8 and UTF16 use 2 bytes for the Umlaute. If I understand you correctly, rules do not allow something like "if byte n is x and byte n+1 is y replace x by a and y by b" at all, right ?
Correct, that isn't supported by Hashcat and likely won't for a long time, due to speed and structure of the rule processor
> [a github issue states] that the utf16 encoding which is part of ntlm hashing does not work realiably for non-ascii chars.
That's correct but only for the optimised kernels, when you use the "-O" argument and only when you use NTLM (-m 1000). No other popular kernel has this same issue, it's just because converting to UTF-16LE is extremely slow and would slow down Hashcat greatly so it's left to the pure kernel instead of the speed-focused, inaccurate optimised kernel. I just generalised and said it will be hashed correctly, due to the fact that Hashcat supports over 500 hash modes and only the very small few that involve both UTF16-LE and optimised kernels, is this a problem. Also, in your single-byte scenario, even optimised NTLM would work just fine, it's only multibyte characters that fail. Royce Williams has already made a github issue for this, to better communicate to the end user, the limitations of any kernel, which this would very much use. Sadly, it's not yet been implemented
https://github.com/hashcat/hashcat/issues/3958
Also, just so we're both on the same page, Ä can hash into different hashes, depending on the encoding so a MD5 hash using UTF-8 (\xc3 \x84) is:
b66491b03046f0846fe4206bc6a0f3c0
While the Extended ASCII version is (\xc4):
ffd0ce5d00f597aa4f0d2bb60d17b6bc
UTF-8 is by far the most common encoding you'll see when dealing with hashes. Hashcat can crack them both (well, except optimised NTLM), as it's just about getting the bytes right:
Code:
> ./hashcat -m 0 ffd0ce5d00f597aa4f0d2bb60d17b6bc -a 3 --hex-charset c4 --quiet --potfile-disable
ffd0ce5d00f597aa4f0d2bb60d17b6bc:$HEX[c4]
> ./hashcat -m 0 b66491b03046f0846fe4206bc6a0f3c0 -a 3 --hex-charset c384 --quiet --potfile-disable
b66491b03046f0846fe4206bc6a0f3c0:$HEX[c384]