All right gents, here's a SIMD-accelerated version that uses SSE2 and, if applicable, XOP. It's ~ 350% faster than the code posted above.
http://bindshell.nl/pub/md5substr_simd.c
I used an unconventional interleaving technique (interleaving each step, rather than each operation) which ended up being a bit faster than current implementations.
On an FX-4100 the XOP code is ~240% faster than hashcat 0.42, and ~1.2% faster than John the Ripper.
On a Xeon X7350, the SSE2 code is ~260% faster than hashcat 0.42, and ~1.1% faster than John the Ripper.
http://bindshell.nl/pub/md5substr_simd.c
I used an unconventional interleaving technique (interleaving each step, rather than each operation) which ended up being a bit faster than current implementations.
On an FX-4100 the XOP code is ~240% faster than hashcat 0.42, and ~1.2% faster than John the Ripper.
On a Xeon X7350, the SSE2 code is ~260% faster than hashcat 0.42, and ~1.1% faster than John the Ripper.
Code:
epixoip@db:~$ ./md5substr aaaaaaaa
Using 16 threads, 12x SSE2
Elapsed: 3s Progress: 1237332096/377149515625 (0.3%), Speed: 426.67 M/s virt, 412.44 M/s real
aaaaaaaa:8*pU]]
epixoip@db:~$ ./md5substr deadbeef
Using 16 threads, 12x SSE2
Elapsed: 8s Progress: 3327996672/377149515625 (0.9%), Speed: 426.67 M/s virt, 416.00 M/s real
deadbeef:L,ud<P
epixoip@db:~$ ./md5substr deadface
Using 16 threads, 12x SSE2
Elapsed: 7s Progress: 2901330432/377149515625 (0.8%), Speed: 426.67 M/s virt, 414.48 M/s real
deadface:V,l?,E
epixoip@db:~$ ./md5substr deadfa11
Using 16 threads, 12x SSE2
Elapsed: 8s Progress: 3327996672/377149515625 (0.9%), Speed: 426.67 M/s virt, 416.00 M/s real
deadfa11:G,vzY;
epixoip@db:~$ ./md5substr cafebabe
Using 16 threads, 12x SSE2
Elapsed: 7s Progress: 2901330432/377149515625 (0.8%), Speed: 426.67 M/s virt, 414.48 M/s real
cafebabe:t,gr>N
epixoip@db:~$ ./md5substr ffffffff
Using 16 threads, 12x SSE2
Elapsed: 9s Progress: 3754662912/377149515625 (1.0%), Speed: 426.67 M/s virt, 417.18 M/s real
ffffffff:e-\1Go