09-15-2014, 02:26 AM
Good stuff. I just have to ask... Would you care to share any (high or low level or just conceptual) details on how you managed to speed up RC4 on GPU?
I tried these obvious things when I played with 40-bit RC4 BF (even older office) on GPU a while ago:
I tried these obvious things when I played with 40-bit RC4 BF (even older office) on GPU a while ago:
- Using char or uint for the state array
- Putting state array in local memory or not
- memcpy IV from a constant array [instead of using swap_state() in a for loop]
- Unrolled set_key() for fixed length
- Fixed-length decryption, unrolled to 32-bit stores