12-05-2013, 07:55 PM
The combinator.bin process maybe combines dictionaries as fast or faster than we need. I ran it for one minute on my test machine combining two small dictionaries and it created a file 8.5GB in size with 579,424,960 lines/combinations. Doing this with no rules would of course be ineffective as that would roughly equate to 9,657,083 tries a second unless you are using a slow algorithm like WPA2, NTLM would slow way down waiting on combinations from the cpu.
Now if rules are applied(depending on the length of the ruleset of course) that will increase the combinations the gpu needs to try. So if we look at it this way, 9,657,083 combinations per second being pushed from the combinator engine using CPU. Then apply a rule set to each one of those combinations.
We can use a fairly lengthly one as shorter ones would be less effective; passwordspro.rule has around 3200 lines of rules. So if we take 9,657,083 * 3,200 that equals 30,902,665,600 attempts per second, which would be enough to keep 4 - 7970s busy cracking NTLM.
This in my mind isn't bad, of course for the guys that are running 8 - 7970s they would basically have four idle GPUs which is undesirable, but it would be a start. This assumes we could use cpu for combining, then use gpu to crack + apply rules. The key would be using larger rule sets when cracking a fast algorithm or if you have a lot of hardware.
I suppose the other option would be to break down the wordlists into chunks and try to run multiple combinator threads on the CPU to effectively feed the GPUs faster if needed. Depending on the CPU or possible I/O of the machine I would think 3 or 4 cores could be tied up fairly easily. Then we could push 9,657,083 * 4 = 38,628,332 combinations per second, add passwordpro.rule to that and we have 123,610,662,400 attempt per second, which would keep over 8 - 7970s busy.
It seems to me that using the CPU/GPU in conjunction with each other would be the best route and maybe not that hard to implement vs developing and maintaining separate kernels just for this attack.
Let me know thoughts or if my logic is wrong.
Now if rules are applied(depending on the length of the ruleset of course) that will increase the combinations the gpu needs to try. So if we look at it this way, 9,657,083 combinations per second being pushed from the combinator engine using CPU. Then apply a rule set to each one of those combinations.
We can use a fairly lengthly one as shorter ones would be less effective; passwordspro.rule has around 3200 lines of rules. So if we take 9,657,083 * 3,200 that equals 30,902,665,600 attempts per second, which would be enough to keep 4 - 7970s busy cracking NTLM.
This in my mind isn't bad, of course for the guys that are running 8 - 7970s they would basically have four idle GPUs which is undesirable, but it would be a start. This assumes we could use cpu for combining, then use gpu to crack + apply rules. The key would be using larger rule sets when cracking a fast algorithm or if you have a lot of hardware.
I suppose the other option would be to break down the wordlists into chunks and try to run multiple combinator threads on the CPU to effectively feed the GPUs faster if needed. Depending on the CPU or possible I/O of the machine I would think 3 or 4 cores could be tied up fairly easily. Then we could push 9,657,083 * 4 = 38,628,332 combinations per second, add passwordpro.rule to that and we have 123,610,662,400 attempt per second, which would keep over 8 - 7970s busy.
It seems to me that using the CPU/GPU in conjunction with each other would be the best route and maybe not that hard to implement vs developing and maintaining separate kernels just for this attack.
Let me know thoughts or if my logic is wrong.