Optimize MD5 hashcat for triple-core (MMX+ALU)
I disassembled hashcat_cli and check MD5 procedure.
Min 6 years exist triple core MD5 asm. Use two MMX and one ALU MD5 instructions paralel.
This idea is aplicable for all 32bit hash or cipher (ex. DES).

I attach example source.

Attached Files
.zip   Md5asm.zip (Size: 55.43 KB / Downloads: 17)
hashcat uses sse2 instructions which are nearly twice as fast as mmx instructions
Sorry, this is my mistake (for me all instructions with prefix 0F is mmx Smile), but not use sse2 + alu.
CPU processing, as well know, that instruction in parallel, where eg. destination operand first is not the source of one of the following.
If use mmx, sse, ... alu bluk "not work" This is optimizing pipeline...
IMHO this is one way for optimalization