The input data set

The features of the input matrix, basically number of columns and nonzeros per column, affect the way GPU benefits from parallelization mechanisms, and has to be carefully chosen to optimize performance. One may decide to use any of the matrices available at the Matrix Market repository (Harwell-Boeing sparse matrix collection), or wisely create its own using a sparse matrix generator we also provide along with mytoy.cu.

Some interesting examples we have created with this generator are:

Filename	Matrix rows	Matrix columns	Nonzeros	Workload	Target
mat-200-200-27k.rua	200	200	27.000	Baseline	Kepler
mat-4000-200-540k.rua	4.000	200	540.000	20 x Baseline	Kepler
mat-32000-200-4320k.rua	32.000	200	4.320.000	160 x Baseline	Kepler
mat-512000-200-69120k.rua	512.000	200	69.120.000	2560 x Baseline	Kepler
mat-300-100-22k.rua	300	100	22.000	Baseline	Fermi
mat-6000-100-440k.rua	6.000	100	440.000	20 x Baseline	Fermi
mat-24000-100-1760k.rua	24.000	100	1.760.000	80 x Baseline	Fermi
mat-96000-100-7040k.rua	96.000	100	7.040.000	320 x Baseline	Fermi

Notes:

Sparse matrices use Compressed Storage Column (CSC) storage format.
One CUDA stream is used for each column of the matrix. There are matrices with 100 and 200 columns, but many more rows, and overall, nonzero elements (up to 69.120.000).
The input matrix can be selected from the sparsematrices directory. If the user wants, he can generate its own matrices to tailor structure and nonzero volume to what he thinks it may benefit more the GFLOPS on GPU. To do so, use the HB-libraries directory.
All nonzeros contain numerical values 0.999999 to prevent underflow/overflow when operating on themselves.

CUDA challenge Manuel Ujaldón @ NVIDIA

Nav view search

Navigation

Search

Instructions

Using GPUs

The quiz

The input data set