6 months later...
AMD released some cpus with both SHA256 and AES hardware instructions, (the latest AMD EPYC line), exploiting them I got to 125 M/s, that is 100x what I was hoping for in my post above.
However looking at hashcat benchmarks for the 1080 leads me to believe I should get ~0.5 G/s with a single GPU.
Again the difficulty is 2x sha256 and 1x aes decode.
Hashcat benchmarks for the 1080 are showing 2.9 G/s for SHA256 (two rounds so make it 1.45 G/s), 140 k/s for 6000 AES rounds (that should translate to 840 M/s AES operations, but it's not that simple because looking at the code the AES structures for Keepass are initialised only once, the key is set only once, etc., so it's likely to be worse than that).
Anyway the AES and SHA256 primitives seems to be already implemented and it's just a matter of gluing them together. I would again be happy to pay someone to do that, and I could open that feature request on github (again I would like to send the money in advance to someone or use a multisign address).
---
However the reason I skipped github is that I wanted to talk about password generation for a bit first. My current password generation strategy is to start with some initial words list, divided into groups, apply some word mangling to them and feed the mangled groups of words to the password cracking program.
So say you have 10 groups with 10 base words each, once mangled those could be 10 x 150 words (each word in each different group is different, different dictionaries), you need to try 150^10 that of course isn't feasible but that was just an example.
My understanding is that hashcat doesn't allow complex dictionaries because they would kill performances. So I'm sort of back to square one if I can't find a way to combine 6+ dictionaries in a GPU kernel (as the CPU bus is limited from my calculations and would be too slow anyway).
TL;DR: GPU cracking is super cool and fast, but should I focus on it to get a 10x increase in speed when using complex dictionaries?
AMD released some cpus with both SHA256 and AES hardware instructions, (the latest AMD EPYC line), exploiting them I got to 125 M/s, that is 100x what I was hoping for in my post above.
However looking at hashcat benchmarks for the 1080 leads me to believe I should get ~0.5 G/s with a single GPU.
Again the difficulty is 2x sha256 and 1x aes decode.
Hashcat benchmarks for the 1080 are showing 2.9 G/s for SHA256 (two rounds so make it 1.45 G/s), 140 k/s for 6000 AES rounds (that should translate to 840 M/s AES operations, but it's not that simple because looking at the code the AES structures for Keepass are initialised only once, the key is set only once, etc., so it's likely to be worse than that).
Anyway the AES and SHA256 primitives seems to be already implemented and it's just a matter of gluing them together. I would again be happy to pay someone to do that, and I could open that feature request on github (again I would like to send the money in advance to someone or use a multisign address).
---
However the reason I skipped github is that I wanted to talk about password generation for a bit first. My current password generation strategy is to start with some initial words list, divided into groups, apply some word mangling to them and feed the mangled groups of words to the password cracking program.
So say you have 10 groups with 10 base words each, once mangled those could be 10 x 150 words (each word in each different group is different, different dictionaries), you need to try 150^10 that of course isn't feasible but that was just an example.
My understanding is that hashcat doesn't allow complex dictionaries because they would kill performances. So I'm sort of back to square one if I can't find a way to combine 6+ dictionaries in a GPU kernel (as the CPU bus is limited from my calculations and would be too slow anyway).
TL;DR: GPU cracking is super cool and fast, but should I focus on it to get a 10x increase in speed when using complex dictionaries?