(10-17-2010, 10:42 AM)atom Wrote: i guess this was already optimized by compilers. hashcat supports pw length up to 55 chars and i think it should so that means that i can not apply this optimization on hashcat. thanks again! your stuff rock :-)
It CAN be applied, you will still be able to compute 55 char hashes. By implementing the previous idea, you just insert ONLY one IF statement (overhead of approx. 4 instuctions) for each 4 SSE2 packed passwords to decide if full 55 char MD5 or simplified MD5 function can be executed. MD5 simplified for up to 31 chars means 28 LESS instructions, much more than previous tips! If only I can have acces to the source code
![Smile Smile](https://hashcat.net/forum/images/smilies/smile.gif)
Also, it can't be optimized by compiler (still speaking about CPU version), because compiler doesn't know the lenghts of computed passwords - they are known in run-time, not in compile time.
If you want something specific to GPU, here it is
![Smile Smile](https://hashcat.net/forum/images/smilies/smile.gif)
There is MAD instruction (page 51) which performs multiplication and addition, so you can improve the bit rotation operation. Bit shifting by n is the same operation as multiplying by 2^n, so an example written in C:
Code:
tmp = a >> 25;
a = a << 7;
a = a | tmp;
Code:
tmp = a >> 25;
a = a * 128;
a = a + tmp;
Before optimization:
Code:
shr.u32 %r201, %r200, 25;
shl.b32 %r202, %r200, 7;
or.b32 %r203, %r202, %r201;
Code:
shr.u32 %r201, %r200, 25;
mad.u32 %r202, %r200, 128, %201
![Big Grin Big Grin](https://hashcat.net/forum/images/smilies/biggrin.gif)