CudaHashcat vs 195M MD5s - a new world record
Previous record was 155M MD5s.
New record is 195M MD5s.
I used a special version of cudaHashcat, which does not sort or remove hashes.
A normal version would take twice as long to start, but everything else would be the same.

Time breakdown for a normal version:
~11s - initialization
295.26s - loading/parsing of hashes
306.79s - sorting/removing hashes
03.28s - structuring salts for cracking tasks
01.94s - generating bitmaps
+INF - cracking

Key points from image:
1. 6051 MB of vRAM was used during cracking phase.
2. Peak private byte usage by cudahashcat64.exe was 15.3 GB, during structuring salts phase.
3. Speed was 2.5518B p/s. Wicked!

Now, a Quadro K6000 or Tesla K40 could load twice as much(<= 400M MD5s), since it has 12GB of vRAM, but those GPUs are not for desktop PCs.

what's the limit of public version of cuda/ocl hashcat?
RAM / vRAM is the limit, as I was using 1.31b1, which differs little from 1.31 release.
The special version had no memory-wise optimizations, it only started the attack faster.