The game

We provide you a CUDA program, mytoy.cu, simple yet complete to change a variety of features in your software and quickly see how they affect GPU performance in the architecture underneath.

The kernel computes operations in parallel on every single element of a sparse matrix. It is the shortest irregular program you may write, allowing you to easily benefit from the new capabilities Kepler is endowed with (like SMX capabilities or Hyper-Q), along with some other mechanisms already existing in CUDA. You as a CUDA programmer have to select the optimal parameters to maximize performance, and more importantly, to deploy the parallelization strategy. Depending on your programming skills and the GPU hardware you have, several levels of difficulty are available for you to choose:

Basic: Play with warps, threads and blocks.
Intermediate: Change the kernel launches.
Advanced: Assign kernels to streams the right way.

Points of interest are tagged in the code with the “MU” label followed by a number, according to the following table:

Control points (within mytoy.cu)
Tag	Description/purpose	Choices	Investigate
MU1	Selects data type	int/float/double (initially: double)	ALU/FPU performance
MU2	Selects type of operation	add/mul/div (initially: add)	Arithmetic latency
MU3	Calculates number of blocks and their size	32, 64, 128, 192, 256, 384, 512, 1024 (initially: 1024)	Parallel deployment
MU4	Declares streams	All kernels in the same stream or one stream for each kernel (initial choice)	Parallel deployment
MU5	Launch kernels on streams	This is tightly coupled with MU4	Parallel deployment

Input parameters to mytoy.cu (provided from the Linux shell)
Position	Meaning	Comments/hints
First	The number of GPU used for running the code	Usually 0 (particularly when using cloud computing)
Second	The number of operations performed on each matriz nonzero	Affects operational intensity. Useful to obtain all the coordinates to draw the roofline model for the target GPU
Third	The file name for the input sparse matrix (you can find them in the sparsematrices directory)	sparsematrices/samplematrix.rua is a sample
Fourth	The file name to write the output results	Useful to validate GPU computations

An example of command for executing the program from the Shell (boldfaced):

/home/ujaldon> ./mytoy 0 4 sparsematrices/samplematrix.rua myoutputfile.txt

CUDA challenge Manuel Ujaldón @ NVIDIA

Nav view search

Navigation

Search

Instructions

Using GPUs

The quiz

The game