10-15-2010, 01:11 PM
again, this is awesome! i've added it to hashcat on complete round 3. speed changed from 9.34M/s -> 9.47M/s. i think i can port it to oclHashcat, too. more infos later.
----Version 0.23----
Threads...: 6
Mode.Left.: Mask '?l?l?l?l' (456976)
Mode.Right: Dict 'example.dict' (129988)
Speed.GPU1: 1009.4M/s (finished)
Speed.GPU2: 1008.8M/s (finished)
Speed.GPU3: 1012.6M/s (finished)
Speed.GPU4: 991.8M/s (finished)
Speed.GPU5: 1013.3M/s (finished)
Speed.GPU6: 1015.4M/s (finished)
Speed.GPU*: [b]6051.3M/s[/b]
Recovered.: 1358/6494 Digests, 0/1 Salts
Progress..: 59400876336/59401396288 (100.00%)
Running...: 10 secs
----Version 0.24----
Threads...: 6
Mode.Left.: Mask '?l?l?l?l' (456976)
Mode.Right: Dict 'example.dict' (129988)
Speed.GPU1: 1097.1M/s (finished)
Speed.GPU2: 1115.3M/s (finished)
Speed.GPU3: 1117.6M/s (finished)
Speed.GPU4: 1100.7M/s (finished)
Speed.GPU5: 1118.8M/s (finished)
Speed.GPU6: 1120.3M/s (finished)
Speed.GPU*: [b]6669.8M/s[/b]
Recovered.: 1358/6494 Digests, 0/1 Salts
Progress..: 59400876336/59401396288 (100.00%)
Running...: 10 secs
(10-17-2010, 10:42 AM)atom Wrote: [ -> ]i guess this was already optimized by compilers. hashcat supports pw length up to 55 chars and i think it should so that means that i can not apply this optimization on hashcat. thanks again! your stuff rock :-)
tmp = a >> 25;
a = a << 7;
a = a | tmp;
tmp = a >> 25;
a = a * 128;
a = a + tmp;
shr.u32 %r201, %r200, 25;
shl.b32 %r202, %r200, 7;
or.b32 %r203, %r202, %r201;
shr.u32 %r201, %r200, 25;
mad.u32 %r202, %r200, 128, %201
(10-17-2010, 04:19 PM)Dalibor Wrote: [ -> ]There is MAD instruction (page 51) which performs multiplication and addition, so you can improve the bit rotation operation.Unfortunately this won't work as integer MAD is just a virtual instruction. PTX isn't real assembler, so it may looks like you're optimizing some instructions but after real compilation to ISA you won't found anything good. As there even no 32-bit integer multiplication for G80-GT200 hardware MAD will be replaced by several 24-bit multiplications and additions (or even back to SHL+ADD if compiler will notice 2^n value).
(10-17-2010, 08:12 PM)IvanG Wrote: [ -> ]Unfortunately this won't work as integer MAD is just a virtual instruction. PTX isn't real assembler...Thanks for explanation, Ivan. So it was false alarm, same as with BFI_INT instruction, which I have found in AMD Evergreen ISA reference guide, it can compute (A & B) | (~A & c), function bitselect() in OpenCL should be translated into this instruction... But it seems that compilers are better than I presume
Shortly, NVCC compiler is smart enough to make rotates as fast as possible, no need for special actions from programmer.
0xc040b340 = 11000000010000001011001101000000
d += G(a,b,c) + password[6];
d += ...01000000;
d = ((d<<9) | (d>>23))
d += a;