Modifying hashcats optimized kernel
#11
(04-21-2021, 02:14 AM)Chick3nman Wrote: "The main reason for this which you can test yourself is just file in with some rules to make LONG passwords then run it with optimized kernels - and compare the speed difference with and without."

I'm not sure what you mean here

With and without the rules? Because that will always benefit the rules from a workload perspective, and will be slower than if you didn't generate the failing words to begin with but kept the same rule related workload. With and without optimized kernel also doesn't make sense, because optimized kernels are doing far more than simply rejecting words because they are too long, that's just a side effect of the optimization. Long passwords, beyond the point of rejection, in large numbers will incur a penalty to your cracking speed, up to the point where no candidates are being hashed in most work groups. At that point, you may see some speed return but it will not be ideal or likely very performant overall.

With 360,000 rules you are well out of reasonable territory for a manageable keyspace, you are correct. That rule set is unreasonably large for an efficient attack setup, but to each their own, i have played with similarly large rule sets a fair amount but never for a normal attack.

Sorry I misspoke as I was trying to catch you while you are still online, apologies. 
You are correct in that the optimization impacts other areas, but I still believe the overall performance increase would take off a minimum of 5% of time taken which is more than enough reason to use it.
Reply
#12
"...the overall performance increase would take off a minimum of 5% of time taken..."

This is where other optimizations will cause FAR larger increases, and i feel like the penalty of late rejections are actually fairly likely to cause a overall performance decrease. Consider what happens when a word of normal length(6-8) meets a rule(ZY) that couldn't possibly produce a short enough plain to fit into the buffer? Well it doesn't even make sense to process that rule at all if its clearly not going to work, right? So we can throw that rule completely out before we even get to the kernel. What if a rule is very unlikely to create a candidate that fits, unless words are very short? We can also throw that rule completely out, saving you (throw away) * (words in wordlist) candidates immediately, and only costing you the time it takes to parse your rule file and trash obviously useless rules. In fact, this is something that i was looking into recently because the way we are handling these rules now leaves some of them in and seemingly creates duplicate words where unnecessary or does what you are looking to do and generates a candidate that we have to reject in the kernel. Just implementing something like that would likely trash enough rules from the 360,000 list off the bat that it'd save you that 5%+ you are after.
Reply
#13
(04-21-2021, 02:36 AM)Chick3nman Wrote: "...the overall performance increase would take off a minimum of 5% of time taken..."

This is where other optimizations will cause FAR larger increases, and i feel like the penalty of late rejections are actually fairly likely to cause a overall performance decrease. Consider what happens when a word of normal length(6-8) meets a rule(ZY) that couldn't possibly produce a short enough plain to fit into the buffer? Well it doesn't even make sense to process that rule at all if its clearly not going to work, right? So we can throw that rule completely out before we even get to the kernel. What if a rule is very unlikely to create a candidate that fits, unless words are very short? We can also throw that rule completely out, saving you (throw away) * (words in wordlist) candidates immediately, and only costing you the time it takes to parse your rule file and trash obviously useless rules. In fact, this is something that i was looking into recently because the way we are handling these rules now leaves some of them in and seemingly creates duplicate words where unnecessary or does what you are looking to do and generates a candidate that we have to reject in the kernel. Just implementing something like that would likely trash enough rules from the 360,000 list off the bat that it'd save you that 5%+ you are after.

That's definitely true, so I focus mainly on analysis and forcing out some of the left overs that have been public for years (md5 ntlm and sha1 + dash usually) which is extremely time consuming and overall just willing to try and test whatever ideas I can to bring a little more speed in.
The reason I run such a large rule set - I run it against around 500gb worth of dictionaries and always run with the same rules using(-debug-mode=1 --debug-file=) so I can analyse the rules. 
I have a more optimal rule file however that's for testing on small wordlists due to how optimized it is. 

I mean we all want the same end results right? Big Grin Optimization! No matter the cost.
Reply
#14
"I mean we all want the same end results right? Big Grin Optimization! No matter the cost."

We do significant optimization on the backend, but there's only so much we can do without being able to see into the future during an attack. Consider the following example:

You have a rule that has multiple functions, as most rules do, that affect the length. This is not uncommon.

Plaintext: pass
Rule: p5 p2

The above will duplicate the word 5 times, then duplicate that buffer 2 times. The first processing step takes in "pass" and produces "passpasspasspasspass". Then we do the next rule step and we take "passpasspasspasspass" and produce "passpasspasspasspasspasspasspasspasspass" but wait, this is now 40 characters long, and the optimize kernel length limit for MD5 is 31 characters long. What do we do? We actually don't process the second rule at all and return the buffer as is, we've already got code that optimizes that step completely away and returns the plain as it is from the first rule process. https://github.com/hashcat/hashcat/blob/...d.cl#L1188

You'll see here, though, that this is NOT a rejection, it's still going to process the pass from the previous step because the buffer is within bounds and it's basically too late, we've already incurred the penalty of loading the buffer and initializing the hash, we still have to wait for the other threads to finish before we can get a new word. This leads to some interesting situations.

What if your rules look like this:

p5 p3
p5 p4
p5 p7

All three of those are valid, unique rules. And all 3 will get processed and kicked back at the same stage, and all 3 will cause the same candidate to be tested each time. This is a nightmare if you are trying to optimize everything away, as the failure happens too late into the process to throw away and gain speed. Throwing these rules away before they get to the rule processor step in the kernel is by far the best way since we can save the penalty of the word being duplicated.

Now, it's possible that we could do some optimization prior to loading these rules, and that's what i was referring to when i said i had already looked into this a bit. Given the rule p8 p4, we can know that what this actually means is p8*4, and we can store it condensed as pW (p32). Now, if the kernel we are running has a max pw length of 31, like the MD5 optimized kernel does, we can effectively throw that rule away before its even loaded. Likewise, if the rule p5 p4 and p4 p5 create the same plaintext(assuming its valid), then running both of them just creates duplicate work and there's little reason to do so. Removing rules that are "unique" but produce duplicate work is a serious optimization that can be made. This doesn't work for everything, but it can work for a few simple cases like the example above, and could shave significant amounts of the total keyspace off, especially with huge wordlists like you are running. Doing this processing outside of hashcat would not be super difficult, though it would have to rely on you knowing the parameters of your future attack to know what rules to reject. Even without that info, however, condensing rules to identify duplicate final states could be done. I may release a tool to do some of this stuff before i work on implementing them in hashcat, I think we may have some of this stuff already handled for certain rules.
Reply
#15
(04-21-2021, 03:18 AM)Chick3nman Wrote: "I mean we all want the same end results right? Big Grin Optimization! No matter the cost."

We do significant optimization on the backend, but there's only so much we can do without being able to see into the future during an attack. Consider the following example:

You have a rule that has multiple functions, as most rules do, that affect the length. This is not uncommon.

Plaintext: pass
Rule: p5 p2

The above will duplicate the word 5 times, then duplicate that buffer 2 times. The first processing step takes in "pass" and produces "passpasspasspasspass". Then we do the next rule step and we take "passpasspasspasspass" and produce "passpasspasspasspasspasspasspasspasspass" but wait, this is now 40 characters long, and the optimize kernel length limit for MD5 is 31 characters long. What do we do? We actually don't process the second rule at all and return the buffer as is, we've already got code that optimizes that step completely away and returns the plain as it is from the first rule process. https://github.com/hashcat/hashcat/blob/...d.cl#L1188

You'll see here, though, that this is NOT a rejection, it's still going to process the pass from the previous step because the buffer is within bounds and it's basically too late, we've already incurred the penalty of loading the buffer and initializing the hash, we still have to wait for the other threads to finish before we can get a new word. This leads to some interesting situations.

What if your rules look like this:

p5 p3
p5 p4
p5 p7

All three of those are valid, unique rules. And all 3 will get processed and kicked back at the same stage, and all 3 will cause the same candidate to be tested each time. This is a nightmare if you are trying to optimize everything away, as the failure happens too late into the process to throw away and gain speed. Throwing these rules away before they get to the rule processor step in the kernel is by far the best way since we can save the penalty of the word being duplicated.

Now, it's possible that we could do some optimization prior to loading these rules, and that's what i was referring to when i said i had already looked into this a bit. Given the rule p8 p4, we can know that what this actually means is p8*4, and we can store it condensed as pW (p32). Now, if the kernel we are running has a max pw length of 31, like the MD5 optimized kernel does, we can effectively throw that rule away before its even loaded. Likewise, if the rule p5 p4 and p4 p5 create the same plaintext(assuming its valid), then running both of them just creates duplicate work and there's little reason to do so. Removing rules that are "unique" but produce duplicate work is a serious optimization that can be made. This doesn't work for everything, but it can work for a few simple cases like the example above, and could shave significant amounts of the total keyspace off, especially with huge wordlists like you are running. Doing this processing outside of hashcat would not be super difficult, though it would have to rely on you knowing the parameters of your future attack to know what rules to reject. Even without that info, however, condensing rules to identify duplicate final states could be done. I may release a tool to do some of this stuff before i work on implementing them in hashcat, I think we may have some of this stuff already handled for certain rules.

That's extremely interesting and definitely thought provoking. 
"Removing rules that are "unique" but produce duplicate work is a serious optimization that can be made." - that's huge if it can be achieved. 
I ran some basic analysis on best64+rockyou as an example of how effective the difference is in repeated attempts about a week ago. I output the results as a dictionary to compare speeds. 
After dupes have been removed.

"1090173188" lines before dupes removed.

"685326735" lines after removal.


Or in this particular example, 37.13% speed increase.
Reply
#16
Right, and there's some things that simply can't be avoided, so be careful with those numbers.

For example this password and these two rules:


password

Z1
$d

Both will produce the plaintext "passwordd" however they are very clearly NOT duplicate rules of each other and can NOT be thrown away unless the plaintext is known BEFORE we even load/process(hence my comment about seeing the future). So while they may produce some instances of duplication, like with what you've measured there, it's not easily possible for us to throw them out prior to loading the wordlist. Some amount that duplication could come down simply to your wordlist or to neither the wordlist or rules on their own but simply the combination of the two. Some amount of that 37% is essentially unavoidable. The method for processing and avoiding the duplicates is clearly infeasible given a wordlist+rule combo of even moderate size as there's no way to store the words and sort/dedupe them. It simply costs less to run them through the attack than to try and remove them.
Reply
#17
(04-21-2021, 03:51 AM)Chick3nman Wrote: Right, and there's some things that simply can't be avoided, so be careful with those numbers.

For example this password and these two rules:


password

Z1
$d

Both will produce the plaintext "passwordd" however they are very clearly NOT duplicate rules of each other and can NOT be thrown away unless the plaintext is known BEFORE we even load/process(hence my comment about seeing the future). So while they may produce some instances of duplication, like with what you've measured there, it's not easily possible for us to throw them out prior to loading the wordlist. Some amount that duplication could come down simply to your wordlist or to neither the wordlist or rules on their own but simply the combination of the two. Some amount of that 37% is essentially unavoidable. The method for processing and avoiding the duplicates is clearly infeasible given a wordlist+rule combo of even moderate size as there's no way to store the words and sort/dedupe them. It simply costs less to run them through the attack than to try and remove them.

Definitely agree, however those rules can be 'avoided' in the sense they can be applied separately in a different attack. Not all of course but the fair majority.
But in the sense - If storage capabilities were cheaper, i'd have most of my lists presorted as dictionaries for straight attacks. Oh a robot can dream.
Reply
#18
I think some of those rules could be avoided, but that's something i would generally consider "acceptable inefficiency" for fast hashes. The trade off in trying to get the storage/processing to remove rules that collide like that is just too great to make it worthwhile. The rules we CAN identify through tricks like the ones i have mentioned so far should definitely be the focus of any optimization work, at least initially. Anything beyond these sorts of things would be extremely situational and would likely require manually retuning/retooling your attacks on the fly, something that is generally not reasonable or feasible for your average runs, especially when you are throwing huge rules and huge dicts at hashes.org left overs. There's already so much to be gained just by cutting down 500gb of wordlists and 360,000 rules into something remotely manageable that it doesn't make sense to start with the exotics for a few percent gain when you can start there.
Reply
#19
(04-21-2021, 04:57 AM)Chick3nman Wrote: I think some of those rules could be avoided, but that's something i would generally consider "acceptable inefficiency" for fast hashes. The trade off in trying to get the storage/processing to remove rules that collide like that is just too great to make it worthwhile. The rules we CAN identify through tricks like the ones i have mentioned so far should definitely be the focus of any optimization work, at least initially. Anything beyond these sorts of things would be extremely situational and would likely require manually retuning/retooling your attacks on the fly, something that is generally not reasonable or feasible for your average runs, especially when you are throwing huge rules and huge dicts at hashes.org left overs. There's already so much to be gained just by cutting down 500gb of wordlists and 360,000 rules into something remotely manageable that it doesn't make sense to start with the exotics for a few percent gain when you can start there.

I think that's where we differ Smile I consider the exotics to be longer than 18.
Reply
#20
I meant exotic optimization methods like new situation-specific optimized kernels. Pulling down the length of passwords and editing or ordering the keyspace of an attack should be relatively simple optimization, though as seen, it's a little difficult for other reasons, even though its conceptually simple. It doesn't require much in terms of writing code, just a fair amount of resources if you want to do certain things, especially the most straightforward way.
Reply