Implement correct toggle of non-default charsets chars like german Umlaute
#1
Hi there,
is there any way to make toggle-rules work with non-default charsets chars like the german Umlaute?
There are upper case ÄÖU und lower case aöu. (well, technically there is also upper case ẞ and lower case ß, but the uppercase version is hardly used at all)

Can hashcat be told to include these mappings in the default toggle syntax? Would I have to manually edit the rules files or is it not possible at all?

Thank you.
Reply
#2
obviously typo: aöu was meant to be äöü - seem like edit is not possible here after some minutes, sorry.
Reply
#3
No, unfortunately. These kinds of multibyte characters aren't really supported at all by rules. They will still be hashed correctly but you couldn't do ä to Ä, for example
Reply
#4
(08-31-2024, 02:46 PM)penguinkeeper Wrote: No, unfortunately. These kinds of multibyte characters aren't really supported at all by rules. They will still be hashed correctly but you couldn't do ä to Ä, for example

Well, it depends on the encoding whether they are multibyte or not, doesn't it?

So is should at least be possible to manually implement toggle for ISO-8859-15 and ISO-8859-1 - as they are extended ASCII and use 1Byte, right?

UTF8 and UTF16 use 2 bytes for the Umlaute. If I understand you correctly, rules do not allow something like "if byte n is x and byte n+1 is y replace x by a and y by b" at all, right ?


You say they are hashed correctly. Now I happened to read https://github.com/hashcat/hashcat/issues/2121
which says that the utf16 encoding which is part of ntlm hashing does not work realiably for non-ascii chars.
In the limits.txt there is still written that this is an issue for the optimized kernels and that the pure kernel is required to handle them correctly. This seems like a contradiction to what you said - or is the german Umlaute a special case which is handled correctly even by the optimized kernel because of its rather high relevance for passwords compared to other non-ascii symbols like emojis?

thank you.
Reply
#5
> Well, it depends on the encoding whether they are multibyte or not, doesn't it?
Yes but most wordlists are in UTF-8, it would be very strange for a wordlist to store Ä as a single byte / extended ASCII and because Hashcat can't tell what the wordlist's encoding is taking as input, so if it was UTF-8 as almost all wordlists are, it would require a multibyte replace, which isn't possible. If your wordlist did happen to contain Ä as a single byte, then it would be possible, just replacing hexadecimal e4 with c4, like this: "s\xe4\xc4".

> UTF8 and UTF16 use 2 bytes for the Umlaute. If I understand you correctly, rules do not allow something like "if byte n is x and byte n+1 is y replace x by a and y by b" at all, right ?
Correct, that isn't supported by Hashcat and likely won't for a long time, due to speed and structure of the rule processor

> [a github issue states] that the utf16 encoding which is part of ntlm hashing does not work realiably for non-ascii chars.
That's correct but only for the optimised kernels, when you use the "-O" argument and only when you use NTLM (-m 1000). No other popular kernel has this same issue, it's just because converting to UTF-16LE is extremely slow and would slow down Hashcat greatly so it's left to the pure kernel instead of the speed-focused, inaccurate optimised kernel. I just generalised and said it will be hashed correctly, due to the fact that Hashcat supports over 500 hash modes and only the very small few that involve both UTF16-LE and optimised kernels, is this a problem. Also, in your single-byte scenario, even optimised NTLM would work just fine, it's only multibyte characters that fail. Royce Williams has already made a github issue for this, to better communicate to the end user, the limitations of any kernel, which this would very much use. Sadly, it's not yet been implemented
https://github.com/hashcat/hashcat/issues/3958

Also, just so we're both on the same page, Ä can hash into different hashes, depending on the encoding so a MD5 hash using UTF-8 (\xc3 \x84) is:
b66491b03046f0846fe4206bc6a0f3c0
While the Extended ASCII version is (\xc4):
ffd0ce5d00f597aa4f0d2bb60d17b6bc
UTF-8 is by far the most common encoding you'll see when dealing with hashes. Hashcat can crack them both (well, except optimised NTLM), as it's just about getting the bytes right:
Code:
> ./hashcat -m 0 ffd0ce5d00f597aa4f0d2bb60d17b6bc -a 3 --hex-charset c4 --quiet --potfile-disable
ffd0ce5d00f597aa4f0d2bb60d17b6bc:$HEX[c4]
> ./hashcat -m 0 b66491b03046f0846fe4206bc6a0f3c0 -a 3 --hex-charset c384 --quiet --potfile-disable
b66491b03046f0846fe4206bc6a0f3c0:$HEX[c384]
Reply
#6
Thank you.
I guess there is no way to tell the pure (non-optimized) kernel to ignore all passwords that do not contain a specific character type, or is there?

Example: NTLM uses UTF-16LE, so in order to crack Umlaut-hashes I would have to use the non-optimized kernel.
If I understand correctly this is also true for wordlists (if wordlists are UTF8 and have Umlaute, optimized kernels will convert them to incorrect UTF-16LE, wouldn't they?)
So it should increase the speed if all passwords that are pure ASCII would be skipped (as they can be checked faster using the optimized kernel) ...
Reply
#7
(09-09-2024, 02:19 PM)fsdafsadfsdsdaf Wrote: If I understand correctly this is also true for wordlists (if wordlists are UTF8 and have Umlaute, optimized kernels will convert them to incorrect UTF-16LE, wouldn't they?)

Update: optimized ntlm kernel does indeed fail on Umlaut passwords from a password list and the pure (non-optimized) one succeeds
Reply
#8
Update2: I used Windows and dumped the SAM to obtain the hash of of a password with 3 Umlauts and a ß for testing.

Code:
hashcat.exe -m1000 -a3 --increment --username -o out.txt --backend-ignore-hip -4 charsets\combined\German.hcchr ?4?4?4?4
succeeds to crack the password. A UTF 8 wordlist does as well.
If I create a charfile in UTF8 however, hashcat fails to crack it.
Reply