OK, News:
1) I got it so far that the speeds are ~ the same as the ones from DeepLearningJohnDoe. That is 766 MH/s vs. 967 MH/s (on a 980Ti +250MHz) using CUDA. Note that we do a markov-optimized search, therefore there is some lost because of the database lookups for each candidate.
2) I solved the fixed salt problem. But it's interessting in how different the speed loss is on 290x and 980Ti. For example the 980Ti drops from 766MH/s to 736MH/s which is "ok", but the 290x... OMG... The reason why I didn't get it working in the first place was because AMD OpenCL SDK strikes back again, but not in a good way.
Fixed salt speed: 470 MH/s
Dynamic salts requires a macro:
#define mysel(a,b,c) ((c) ? a : b) -- Drops 470 -> 33 MH/s !!!
Luckily I played around and found the following workarounds:
#define mysel(a,b,c) (select (a,b,c)) -- Getting 110 MH/s
#define mysel(a,b,c) (bitselect (a,b,(c) ? 0xffffffff : 0)) -- Getting 251 MH/s
So there's a drop from 470 MH/s to 251 for dynamic salt support. I'll take it for now as we also get the multihash support with it.
1) I got it so far that the speeds are ~ the same as the ones from DeepLearningJohnDoe. That is 766 MH/s vs. 967 MH/s (on a 980Ti +250MHz) using CUDA. Note that we do a markov-optimized search, therefore there is some lost because of the database lookups for each candidate.
2) I solved the fixed salt problem. But it's interessting in how different the speed loss is on 290x and 980Ti. For example the 980Ti drops from 766MH/s to 736MH/s which is "ok", but the 290x... OMG... The reason why I didn't get it working in the first place was because AMD OpenCL SDK strikes back again, but not in a good way.
Fixed salt speed: 470 MH/s
Dynamic salts requires a macro:
#define mysel(a,b,c) ((c) ? a : b) -- Drops 470 -> 33 MH/s !!!
Luckily I played around and found the following workarounds:
#define mysel(a,b,c) (select (a,b,c)) -- Getting 110 MH/s
#define mysel(a,b,c) (bitselect (a,b,(c) ? 0xffffffff : 0)) -- Getting 251 MH/s
So there's a drop from 470 MH/s to 251 for dynamic salt support. I'll take it for now as we also get the multihash support with it.