the short answer is "parallelization".
You need to keep in mind that it only makes sense to use GPUs because they can parallalize things enormously good, i.e. the power of the GPUs comes only from running all those cores in parallel.
(the FAQ, in my opinion, summarizes this very good "Those small compute devices on GPU (shader) they are relatively slow and dumb compared to a CPU. ... What makes a GPU so fast is that there is a lot of those slow and dumb shaders. That means to make use of it, we have to parallelize the problem.", see https://hashcat.net/faq#how_to_create_mo...full_speed).
The problem with scrypt is that we need a lot of memory for each scrypt computation i.e. each compute unit ("core") needs a relatively huge amount of VRAM. This is a main property of scrypt, i.e. to make it GPU (and fpga etc) unfriendly (note: this also depends a lot on the scrypt settings: N, r, p).
There are other problems with scrypt that are especially notable for GPU devices, like that OpenCL memory allocation have some limits (especially notable on Nvidia, 1/4 of VRAM, but hashcat works around this limit about how much memory can be allocated at once by allocating several blocks of memory)...
Furthermore, the scrypt tmto setting is used to work around some memory allocation limits, but it comes with some disadvantages too (e.g. if you want to use less memory, the speed will drop too etc).
There are many technical explanations/facts why scrypt is GPU-unfriendly. GPUs getting more and more VRAM might help a little bit in the future, but beware that scrypt has some cost factors (N, r, p) and therefore one could just increase the cost and make it slow to crack again.
It is needless to say that without the parallelization (those thousands of cores we have on GPUs) that cracking hashes wouldn't be that fast on GPUs. Therefore, we need the parallelization otherwise it would be faster to crack the hashes with CPU only (which in some cases like for scrypt/bcrypt with some hight cost factors is already the case).
so the short answer is, that also your OpenCL CPU uses a lot of RAM, it just (probably) doesn't have those thousands of cores but just some single ones (e.g. 16 cores). Furthermore, today we might still have more allocatable RAM compared to VRAM (but maybe this will change in the near future, the trend already is to have more and more VRAM on GPUs).
You need to keep in mind that it only makes sense to use GPUs because they can parallalize things enormously good, i.e. the power of the GPUs comes only from running all those cores in parallel.
(the FAQ, in my opinion, summarizes this very good "Those small compute devices on GPU (shader) they are relatively slow and dumb compared to a CPU. ... What makes a GPU so fast is that there is a lot of those slow and dumb shaders. That means to make use of it, we have to parallelize the problem.", see https://hashcat.net/faq#how_to_create_mo...full_speed).
The problem with scrypt is that we need a lot of memory for each scrypt computation i.e. each compute unit ("core") needs a relatively huge amount of VRAM. This is a main property of scrypt, i.e. to make it GPU (and fpga etc) unfriendly (note: this also depends a lot on the scrypt settings: N, r, p).
There are other problems with scrypt that are especially notable for GPU devices, like that OpenCL memory allocation have some limits (especially notable on Nvidia, 1/4 of VRAM, but hashcat works around this limit about how much memory can be allocated at once by allocating several blocks of memory)...
Furthermore, the scrypt tmto setting is used to work around some memory allocation limits, but it comes with some disadvantages too (e.g. if you want to use less memory, the speed will drop too etc).
There are many technical explanations/facts why scrypt is GPU-unfriendly. GPUs getting more and more VRAM might help a little bit in the future, but beware that scrypt has some cost factors (N, r, p) and therefore one could just increase the cost and make it slow to crack again.
It is needless to say that without the parallelization (those thousands of cores we have on GPUs) that cracking hashes wouldn't be that fast on GPUs. Therefore, we need the parallelization otherwise it would be faster to crack the hashes with CPU only (which in some cases like for scrypt/bcrypt with some hight cost factors is already the case).
so the short answer is, that also your OpenCL CPU uses a lot of RAM, it just (probably) doesn't have those thousands of cores but just some single ones (e.g. 16 cores). Furthermore, today we might still have more allocatable RAM compared to VRAM (but maybe this will change in the near future, the trend already is to have more and more VRAM on GPUs).