05-13-2017, 07:53 PM
8 threads * 8-way SIMD = 64 32-bit ops in parallel
16 threads * 4-way SIMD = 64 32-bit ops in parallel
32 threads * 4-way SIMD = 128 32-bit ops in parallel
24 threads * 8-way SIMD = 192 32-bit ops in parallel
That's why 256-bit registers are better than more cores. And leveraging AVX2 is trivial, the OpenCL compiler does it for you even.
16 threads * 4-way SIMD = 64 32-bit ops in parallel
32 threads * 4-way SIMD = 128 32-bit ops in parallel
24 threads * 8-way SIMD = 192 32-bit ops in parallel
That's why 256-bit registers are better than more cores. And leveraging AVX2 is trivial, the OpenCL compiler does it for you even.