03-10-2013, 02:20 PM
NeonFlash, here's an illustration of how much this optimization is good for.
Using the example code above, optimized for length 8:
Same code, modified for a full single block (not optimizing out the ADDs):
So by optimizing out the ADDs we gained 40%. And mind you this is on a low-end CPU (FX-4100.)
Using the example code above, optimized for length 8:
Code:
epixoip@ike:$ ./md5substr_len8 babeface
Using 4 threads, 12x XOP
Elapsed: 17s Progress: 1935998064/377149515625 (0.5%), Speed: 117.33 M/s virt, 113.88 M/s real
babeface:>2HX2l
Same code, modified for a full single block (not optimizing out the ADDs):
Code:
epixoip@ike:$ ./md5substr_len55 babeface
Using 4 threads, 12x XOP
Elapsed: 29s Progress: 2015997984/404567235136 (0.5%), Speed: 74.67 M/s virt, 71.11 M/s real
babeface:>2HX2l
So by optimizing out the ADDs we gained 40%. And mind you this is on a low-end CPU (FX-4100.)