Bitslice DES S-boxes with LOP3.LUT instructions
#2
Bitsliced DES makes sense with brute-force -a 3 mode. Otherwise the register pressure is too strong and your data gets swapped on global memory, making everything slower than it would be without bitslice. That's why I don't do bitslice with -a0 and -a1 kernel. Or in other words, if your kernel does anything else than DES, then you will end up in register pressure and it will be slower than as it is now. It also explains why I don't do it with -m 3100, even if it's just pure DES and use it only for -m 1500 and -m 3000 in -a 3 mode.

However, 360M/s on a titan x for a single DES encrypt or decrypt sounds too slow, even with the vanilla DES code. If it's just one round, it should be 1000MH/s or more. That tells me whatever you do before or after the DES, will create too much register pressure and with bitsliced code, your code will run slower than it is now.


Messages In This Thread
RE: Bitslice DES S-boxes with LOP3.LUT instructions - by atom - 07-14-2016, 12:20 PM