[dev thoughts?] encoding inputfile defines outputfile, no default?

[dev thoughts?] encoding inputfile defines outputfile, no default? - Printable Version

+- hashcat Forum (https://hashcat.net/forum)
+-- Forum: Developer (https://hashcat.net/forum/forum-39.html)
+--- Forum: hashcat (https://hashcat.net/forum/forum-40.html)
+--- Thread: [dev thoughts?] encoding inputfile defines outputfile, no default? (/thread-11880.html)

[dev thoughts?] encoding inputfile defines outputfile, no default? - Snoopy - 03-28-2024

hiho, need some hints or advice from a dev on a special topic regarding encoding problems, im from germany so i deal a lot with german umlauts passwords

i have 2 different md5 hashes which in fact are the same german word (i know cause i cracked them both succesfully)

now to the fun part:

depending on the encoding of the password inputfile (UTF-8 or Windows-1252) i can crack one of them, thats straight forward so far, now i recognized, when using --outfile and --outfile-autohex-disable in conjunction , the resulting textfile resembles the encoding of the inputfile, so utf-8 -> utf-8, windows-1252 -> windows-1252, while the potfile is always standard UTF-8, i think this is also the intended behavior

i know there are options encoding-from/to for internal encoding in hashcat, but no one for output

my workaround is, im using a small python script to split my utf-8 passwordfiles into plain ascii types of passwords and non-ascii types, the non-ascii types i store in 2 separate files with utf-8 and windows-1252 encoding and run them both (dictionary directory) against targets.

the question i have, is there a more elegant way for attacking such hashes im not aware of or any idea i didnt came up with yet? my goal would be to have just one UTF-8 file and still being possible to crack both hashes

i know the problem starts on the side which generated these hashes in that way, but i hope i get some new input on these problem

edit:
so my main problem is
ü in UTF-8 is 2 bytes b'\xc3\xbc'
ü in win-1252 is one byte b'\xfc'

for sure i could generate different masks with hexcharsets for this problem, but some kind of inbuild mapping UTF-8 -> Windows 1252 or the other way around would make things easier, i tried options --encoding-from/to but with no success