Getting DES faster

I'm presently digging into DES (-m14000) in order to understand what happens on bit level. I have a 2080 SUPER and get my own written kernels to ridiculous #500 MKeys/s using CUDA. I'm trying to optimize my code and try to take hashcat as reference (I know, it's OpenCL). Looking at the hashcat status page of hashcat while working on DES, I see a salt.

In how far can a salt help to get DES faster?
Is salt the only optimization tool used by hashcat using DES?

Hints or links are sufficient and would greatly help.

