Non-trivial CUDA The current snippet of code calling CUDA is: cuda_step<<< 1, 1 >>> This uses only one CUDA thread, and is probably extremely inefficient. To verify this, let’s add some way to time the program. I could use NVIDIA’s nsys, or just the...
pbhnblog.ballif.eu8 min readNo responses yet.