Substitution rule with non-ascii characters
#1
Hello,

I'm currently trying to write some custom mangling rules that involve non-ASCII characters, specifically focusing on the substitution ('s') command. However, it does not seem to work. For example, when I try to use a rule like "saą", I get an "unsupported rule" error (version 6.2.6).

Could you please clarify whether I am formatting the rule incorrectly, or if unicode characters are simply not supported in this context? 

Thank you.
Reply
#2
If those nonascii characters are more than 1 byte when UTF8 encoded, then this will not work currently as the s rule operator only accepts single byte substitutions.
Reply
#3
(01-01-2024, 09:50 PM)Chick3nman Wrote: If those nonascii characters are more than 1 byte when UTF8 encoded, then this will not work currently as the s rule operator only accepts single byte substitutions.

I see, thanks for the prompt response.
I have only limited experience dealing with Hashcat's source code. If one would want to enable byte-to-bytes substitutions, what do you think would be the amount of work required?

Please, use this scale:
  1. That's easy.
  2. It would take a week.
  3. Impossible, you're going to break everything.
Thanks.
Reply
#4
This is something that has come up regularly for years. Unfortunately, it'd require a quite large refactor of the current (very complex) ruleprocessor. As you can imagine, it has to be able to generate upwards hundreds of billions of candidates per second so it has to be as efficient as possible. Enabling multibyte support would slow this down by an unknown amount so the only problem isn't dev time, it's also computational efficiency. There are some rules that you can already use with multibyte like if you had µ , that is c2 b5 in hexadecimal notation, so the rule you want would be "$\xc2 $\xb5". Example:

Code:
$ echo "" | ./hashcat.exe --stdout -j '$\xc2 $\xb5'
µ
Reply
#5
(01-02-2024, 11:35 AM)penguinkeeper Wrote: This is something that has come up regularly for years. Unfortunately, it'd require a quite large refactor of the current (very complex) ruleprocessor. As you can imagine, it has to be able to generate upwards hundreds of billions of candidates per second so it has to be as efficient as possible. Enabling multibyte support would slow this down by an unknown amount so the only problem isn't dev time, it's also computational efficiency. There are some rules that you can already use with multibyte like if you had µ , that is c2 b5 in hexadecimal notation, so the rule you want would be "$\xc2 $\xb5". Example:

Code:
$ echo "" | ./hashcat.exe --stdout -j '$\xc2 $\xb5'
µ

got it, thanks.

There is no similar trick for 's', right? At least for "s 1byte 2bytes". I guess, we can do it only for "s nbytes nbytes".
Reply
#6
yes, there is no trick for substitude

i think the only workaround would be to do the substitude by yourself with a simple script (python whatsoever) and your input dictionary beforehand
Reply
#7
Smile 
(01-02-2024, 04:33 PM)Snoopy Wrote: yes, there is no trick for substitude

i think the only workaround would be to do the substitude by yourself with a simple script (python whatsoever) and your input dictionary beforehand

Ack. Yes, that seems the only viable option at this point.

However, briefly returning to the options of working on the source code, do you think that adding a new command (i.e., a substitution that works on arbitrary chars) rather than modifying the existing 's' would be any easier? This, ignoring the possible slowdown. (I would call it 'ș' )
Reply
#8
It still goes against a lot of the core of the ruleprocessor and every one of the rules would have to be refactored and re-tested, certainly not an easy feat. Possible but not easy or fast. As previously mentioned, you can use a custom script or a pre-made thing like RuleProcessorY (https://github.com/0xVavaldi/ruleprocessorY) which is an external tool that supports multibyte rules already.
Couple of related bits of source code:
https://github.com/hashcat/hashcat/blob/.../inc_rp.cl
https://github.com/hashcat/hashcat/blob/...timized.cl
Reply
#9
(01-02-2024, 05:25 PM)penguinkeeper Wrote: It still goes against a lot of the core of the ruleprocessor and every one of the rules would have to be refactored and re-tested, certainly not an easy feat. Possible but not easy or fast. As previously mentioned, you can use a custom script or a pre-made thing like RuleProcessorY (https://github.com/0xVavaldi/ruleprocessorY) which is an external tool that supports multibyte rules already.
Couple of related bits of source code:
https://github.com/hashcat/hashcat/blob/.../inc_rp.cl
https://github.com/hashcat/hashcat/blob/...timized.cl

I see, thanks.
At the end, I went with pre-processing the wordlist. 

Thanks again!
Reply