Bitslice DES S-boxes with LOP3.LUT instructions
#3
(07-14-2016, 12:20 PM)atom Wrote: Bitsliced DES makes sense with brute-force -a 3 mode. Otherwise the register pressure is too strong and your data gets swapped on global memory, making everything slower than it would be without bitslice. 

I've forgot to mention: I'm mainly (only) interested in brute forcing.

Quote:That's why I don't do bitslice with -a0 and -a1 kernel. Or in other words, if your kernel does anything else than DES, then you will end up in register pressure and it will be slower than as it is now.

The algorithm also used a number of hashing algorithms.

Quote:It also explains why I don't do it with -m 3100, even if it's just pure DES and use it only for -m 1500 and -m 3000 in -a 3 mode.

The algorithm also  uses a number of hash calculations.

Quote:However, 360M/s on a titan x for a single DES encrypt or decrypt sounds too slow, even with the vanilla DES code. If it's just one round, it should be 1000MH/s or more. That tells me whatever you do before or after the DES, will create too much register pressure and with bitsliced code, your code will run slower than it is now.

I use three DES encrypt / decrypts + several hash calculation so ~360M sound pretty ok right?

What would you recommend for speed improvement? The main loop is already optimized in a - IMHO - pretty decent way. My first impression is that a < 400 speed boost for 1 high end 3k+ core GPU versus 1 outdated CPU is pretty poor Sad

Thanks, John


Messages In This Thread
RE: Bitslice DES S-boxes with LOP3.LUT instructions - by John Doe - 07-15-2016, 02:47 AM