01-09-2018, 08:09 PM
I'm just afraid the CPU would become the bottleneck here. My unoptimised password generation algorithm does around 5-10 M/s per core on my laptop, I need a 10-20x in speed to keep up with a single GPU, assuming 4 beefy cpu cores.
Also bandwidth seemed like a problem to me, if the passwords are 20 chars long * 800e6 that's 16 GB/s, GPU buses are slower than that, right? A quick search seems to show a single PCIe 2.0 16x bus is less than 4 GB/s.
Are you suggesting to send combinations only, keeping the dictionaries in the GPU's ram? Maybe that way I could reduce the traffic to say 8 bytes per password, but I would still be way off. Why not doing the password generation inside the GPU when you get to that stage anyway?
Also bandwidth seemed like a problem to me, if the passwords are 20 chars long * 800e6 that's 16 GB/s, GPU buses are slower than that, right? A quick search seems to show a single PCIe 2.0 16x bus is less than 4 GB/s.
Are you suggesting to send combinations only, keeping the dictionaries in the GPU's ram? Maybe that way I could reduce the traffic to say 8 bytes per password, but I would still be way off. Why not doing the password generation inside the GPU when you get to that stage anyway?