(10-17-2010, 10:42 AM)atom Wrote: i guess this was already optimized by compilers. hashcat supports pw length up to 55 chars and i think it should so that means that i can not apply this optimization on hashcat. thanks again! your stuff rock :-)
It CAN be applied, you will still be able to compute 55 char hashes. By implementing the previous idea, you just insert ONLY one IF statement (overhead of approx. 4 instuctions) for each 4 SSE2 packed passwords to decide if full 55 char MD5 or simplified MD5 function can be executed. MD5 simplified for up to 31 chars means 28 LESS instructions, much more than previous tips! If only I can have acces to the source code
Also, it can't be optimized by compiler (still speaking about CPU version), because compiler doesn't know the lenghts of computed passwords - they are known in run-time, not in compile time.
If you want something specific to GPU, here it is I have examined the *.kernel files which are provided with oclHashcat and also examined document ptx_isa_1.3.pdf by nVidia to see the whole instruction set and voila, an idea...
There is MAD instruction (page 51) which performs multiplication and addition, so you can improve the bit rotation operation. Bit shifting by n is the same operation as multiplying by 2^n, so an example written in C:
Code:
tmp = a >> 25;
a = a << 7;
a = a | tmp;
Code:
tmp = a >> 25;
a = a * 128;
a = a + tmp;
Before optimization:
Code:
shr.u32 %r201, %r200, 25;
shl.b32 %r202, %r200, 7;
or.b32 %r203, %r202, %r201;
Code:
shr.u32 %r201, %r200, 25;
mad.u32 %r202, %r200, 128, %201