04-18-2013, 05:02 PM
You mean having 4 tables rather than one plus rotates?
In the APP SDK sample, they do it precisely as it is described in the papers.
However, you can combine SubBytes,ShiftRows and MixColumns into a sequence of table lookups (either one 256-element uint table plus a rotate operation, or four separate 256-element lookup tables). This though involves more memory accesses is also faster. OpenSSL for example does that (and I guess most implementations do that too).
In the APP SDK sample, they do it precisely as it is described in the papers.
However, you can combine SubBytes,ShiftRows and MixColumns into a sequence of table lookups (either one 256-element uint table plus a rotate operation, or four separate 256-element lookup tables). This though involves more memory accesses is also faster. OpenSSL for example does that (and I guess most implementations do that too).