Welcome!
This practical tutorial provides you an easy way to interact with features of the GPUs based on the Kepler architecture, like:
- Ways to deploy massive parallelism via blocks, kernels and streams for Hyper-Q.
- Computing power (GFLOPS) and data bandwidth (GB/s.).
- Latency for operands (int/float/double…) and operators (add/mul/div…).
You face different challenges here as CUDA programmer:
- Analyze performance for all the features described above, binding results to the GPU hardware for a better knowledge of its architecture.
- Investigate ways and mechanisms to get closer to the theoretical peak performance of a GPU.
- See how GFLOPS and bandwidth behave and are related to each other using the roofline model.
Enjoy the hands-on and... good luck with CUDA!