Welcome!

This practical tutorial provides you an easy way to interact with features of the GPUs based on the Kepler architecture, like:

  • Ways to deploy massive parallelism via blocks, kernels and streams for Hyper-Q.
  • Computing power (GFLOPS) and data bandwidth (GB/s.).
  • Latency for operands (int/float/double…) and operators (add/mul/div…).

You face different challenges here as CUDA programmer:

  • Analyze performance for all the features described above, binding results to the GPU hardware for a better knowledge of its architecture.
  • Investigate ways and mechanisms to get closer to the theoretical peak performance of a GPU.
  • See how GFLOPS and bandwidth behave and are related to each other using the roofline model.

Enjoy the hands-on and... good luck with CUDA!