10-14-2010, 03:02 PM 
		
	
	
		Hi, sorry, i don't have much time now, so briefly another tip. Again, 'clever' compiler may optimize it for you, but your code doesn't look like compiler do the job well. 
First two steps of round 3:
a += ((b ^ c) ^ d) + const + pass;
a = ((a<<const) | (a>>(32-const)) ) + b;
d += (a ^ (b ^ c)) + const + pass;
d = ((d<<const) | (d>>(32-const)) ) + a;
Optimized first two steps of round 3:
tmp = b ^ c;
a += (tmp ^ d) + const + pass;
a = ((a<<const) | (a>>(32-const)) ) + b;
d += (tmp ^ a) + const + pass;
d = ((d<<const) | (d>>(32-const)) ) + a;
You can rewrite all pair steps in round 3 in that manner (i.e. computing tmp in each (2n)th step and use it in each (2n+1)th step. Just for sure, variable tmp should be declared as 'register' to avoid writing and reading 'tmp' to/from memory.
I'm not familiar with IRC yet, so maybe I'll join later. I'm using jabber for now.
	
	
	
	
First two steps of round 3:
a += ((b ^ c) ^ d) + const + pass;
a = ((a<<const) | (a>>(32-const)) ) + b;
d += (a ^ (b ^ c)) + const + pass;
d = ((d<<const) | (d>>(32-const)) ) + a;
Optimized first two steps of round 3:
tmp = b ^ c;
a += (tmp ^ d) + const + pass;
a = ((a<<const) | (a>>(32-const)) ) + b;
d += (tmp ^ a) + const + pass;
d = ((d<<const) | (d>>(32-const)) ) + a;
You can rewrite all pair steps in round 3 in that manner (i.e. computing tmp in each (2n)th step and use it in each (2n+1)th step. Just for sure, variable tmp should be declared as 'register' to avoid writing and reading 'tmp' to/from memory.
I'm not familiar with IRC yet, so maybe I'll join later. I'm using jabber for now.
 
 

 
