How does 15 character limitation help in a speed up?
#2
Making assumptions about the input size allows us to do length-specific optimizations.

Using MD5 as an example, an MD5 block only holds 64 bytes of input data if you are using a multi-block implementation. If you do a single block implementation with with no multi-block support, then you can only support 55 bytes of input data (block[0]-block[13] hold the 55 bytes of input data plus the 0x80 terminator, block[14] holds the length, and block[15] is unused.) So by assuming an input of 55 bytes or less, we can gain quite a bit of speed by optimizing out all of the bits that are only used in a multi-block implementation.

But we can do better than simply not using a multi-block implementation by making further assumptions about the input size. By assuming an input size of up to 15 bytes (which becomes 16 bytes when you append the 0x80 terminator), we only need to operate on four 32-bit words. We can skip all that padding that you mention, because it is simply not needed. This allows the entire algorithm to fit nicely into registers without spilling. And since we skipped all that padding, we can ALSO optimize out 40 add instructions on that null data.

If you would like something to study, have a look at http://bindshell.nl/pub/md5substr_simd.c (one of my rare examples of public code), in which I optimize for length 8 (as there's absolutely no reason to support brute force for longer lengths on CPU.)


Messages In This Thread
RE: How does 15 character limitation help in a speed up? - by epixoip - 03-10-2013, 12:58 PM