Does hashcat not actually use CUDA for Nvidia cards now? I'm puzzled over the fact that there's no .cu files on github but there are many .cl files for OpenCL. I keep thinking back to (now outdated) cudaHashcat vs oclHashcat and find myself quite confused over this.

I'm doing a research project for school where I'm making a CUDA md5 cracker, and while programming my own md5 implementation in CUDA, I was thinking of using hashcat's CUDA md5 for good measure to compare against mine/to have a well-implemented version since mine is bound to not be ideal.

Now that I actually look, however, this doesn't seem to be the case. I've already committed to using CUDA specifically for my project so I don't want to just switch to OpenCL. Secondly, I thought OpenCL was noticeably slower on Nvidia cards compared to CUDA due to Nvidia's snail-pace at supporting it? Hasn't it still not gotten OpenCL 2.0 support yet?

Thanks! (I wasn't sure if this belonged in the developer or support forum more, please move if I chose incorrectly)
That's right, hashcat no longer uses CUDA, but OpenCL. There's some speed drop caused by OpenCL but in Hashcat we can either avoid it by doing some workaround in the code (which requires SM >= 50) or simply it has no effect because the functions used in crypto are not related to floating point operations (it's all true integer operations). Switching to OpenCL also enabled hashcat to make use of CPU, FPGA, etc. There's a lot of good reasons. For cards < SM 50 there's a performance drop, but that's old GPU and since this is a application used in high-performance environments we don't care much. OpenCL 2.0 brings no advantage for Hashcat, that's why we can stick to OpenCL 1.2 (actually OpenCL 1.1 but sticking to OpenCL 1.2 makes it easier to live with the drivers).
Very interesting, thanks for such a detailed response! So in your personal opinion, do you think OpenCL is overall a better choice than CUDA unless you explicitly need CUDA-specific features? Your post makes me want to consider using OpenCL instead if it is a more "well-rounded" experience. The cross-platform benefit is nice but since I'm just beginning to be exposed to scientific computing I also want to have the best base to start with. Any opinion on difficulty of either one?
From a technical perspective and for our use-case OpenCL is clearly superior to CUDA. The problem is more like a political one. NV (being the current leader in producing hardware practical for password cracking) may decide to artificially drop OpenCL performance to make their own CUDA product look stronger. That happend in the past (around driver 180.x) and there's no guarantee they will never do it again. Currently they only indirectly do it, for example by having the CPU running in a burning loop waiting the the GPU kernel to finish and giving the OpenCL user no option to disable this, while in CUDA there's such an option.

The actual performance otherwise (in a perfect written kernel) is the same. Both "interfaces" just compile to an intermediate code (PTX in case of NV). But this enables a new problem. For example by choosing different LLVM compiler versions in OpenCL and CUDA to end up with different intermediate code. It's possible that they have differences in optimization or even bugs.