From a technical perspective and for our use-case OpenCL is clearly superior to CUDA. The problem is more like a political one. NV (being the current leader in producing hardware practical for password cracking) may decide to artificially drop OpenCL performance to make their own CUDA product look stronger. That happend in the past (around driver 180.x) and there's no guarantee they will never do it again. Currently they only indirectly do it, for example by having the CPU running in a burning loop waiting the the GPU kernel to finish and giving the OpenCL user no option to disable this, while in CUDA there's such an option.
The actual performance otherwise (in a perfect written kernel) is the same. Both "interfaces" just compile to an intermediate code (PTX in case of NV). But this enables a new problem. For example by choosing different LLVM compiler versions in OpenCL and CUDA to end up with different intermediate code. It's possible that they have differences in optimization or even bugs.
The actual performance otherwise (in a perfect written kernel) is the same. Both "interfaces" just compile to an intermediate code (PTX in case of NV). But this enables a new problem. For example by choosing different LLVM compiler versions in OpenCL and CUDA to end up with different intermediate code. It's possible that they have differences in optimization or even bugs.