09-15-2023, 05:35 PM
This is a known limitation due to the way work is distributed to the GPUs. There are a few tricks you can use if your algorithm is a primitive algorithm that supports various constructions. In your case, there's nothing you can do.