OK, nice to see progress , so we are attacking 10M/s, 1000M/s respectively
Here goes another thing connected with instruction amount reduction... If I'm not wrong, you are doing this in oclHashcat yet. So... In every step of MD5 there is something like this:
a += ((b ^ c) ^ d) + const[n] + password[n];
a = ((a<<const) | (a>>(32-const)) ) + b;
Let's take a look at thist part : const[n] + password[n]
I think that even in the dictionary attack, most of the passwords don't exceed some lenght. For example, let's presume that most of the passwords will be shorten than 20 bytes. It means that most of the time, only password[0..4] will be changing and password[5..14] will contain only zeroes. Why shall PC still adds zero to constant?
So we can have another version of MD5 function, where for n!=0,1,2,3,4,14 will be password[n] = 0, so it can be written without unnecessary addition as:
a += ((b ^ c) ^ d) + const[n];
a = ((a<<const) | (a>>(32-const)) ) + b;
Before starting MD5, you should decide what version to use. It could be done simply by testing
if (password[5]==0) MD5_simpified();
else MD5_full;
In SSE2, you can test if the whole 128-bit password[5] variable is zero, so the result will depend on the longest of four tested passwords.
Of course, 20 chars was only my example, you can do version for different count of characters in password based upon statistics which password lenghts are dominating... (maybe 32 to fit in md5(md5(pass)) ). What do you think?
Here goes another thing connected with instruction amount reduction... If I'm not wrong, you are doing this in oclHashcat yet. So... In every step of MD5 there is something like this:
a += ((b ^ c) ^ d) + const[n] + password[n];
a = ((a<<const) | (a>>(32-const)) ) + b;
Let's take a look at thist part : const[n] + password[n]
I think that even in the dictionary attack, most of the passwords don't exceed some lenght. For example, let's presume that most of the passwords will be shorten than 20 bytes. It means that most of the time, only password[0..4] will be changing and password[5..14] will contain only zeroes. Why shall PC still adds zero to constant?
So we can have another version of MD5 function, where for n!=0,1,2,3,4,14 will be password[n] = 0, so it can be written without unnecessary addition as:
a += ((b ^ c) ^ d) + const[n];
a = ((a<<const) | (a>>(32-const)) ) + b;
Before starting MD5, you should decide what version to use. It could be done simply by testing
if (password[5]==0) MD5_simpified();
else MD5_full;
In SSE2, you can test if the whole 128-bit password[5] variable is zero, so the result will depend on the longest of four tested passwords.
Of course, 20 chars was only my example, you can do version for different count of characters in password based upon statistics which password lenghts are dominating... (maybe 32 to fit in md5(md5(pass)) ). What do you think?