The input data set

The features of the input matrix, basically number of columns and nonzeros per column, affect the way GPU benefits from parallelization mechanisms, and has to be carefully chosen to optimize performance. One may decide to use any of the matrices available at the Matrix Market repository (Harwell-Boeing sparse matrix collection), or wisely create its own using a sparse matrix generator we also provide along with mytoy.cu.

Some interesting examples we have created with this generator are:

Filename Matrix rows Matrix columns Nonzeros Workload Target
mat-200-200-27k.rua 200 200 27.000 Baseline Kepler
mat-4000-200-540k.rua 4.000 200 540.000 20 x Baseline Kepler
mat-32000-200-4320k.rua 32.000 200 4.320.000 160 x Baseline Kepler
mat-512000-200-69120k.rua 512.000 200 69.120.000 2560 x Baseline Kepler
mat-300-100-22k.rua 300 100 22.000 Baseline Fermi
mat-6000-100-440k.rua 6.000 100 440.000 20 x Baseline Fermi
mat-24000-100-1760k.rua 24.000 100 1.760.000 80 x Baseline Fermi
mat-96000-100-7040k.rua 96.000 100 7.040.000 320 x Baseline Fermi

Notes:

  • Sparse matrices use Compressed Storage Column (CSC) storage format.
  • One CUDA stream is used for each column of the matrix. There are matrices with 100 and 200 columns, but many more rows, and overall, nonzero elements (up to 69.120.000).
  • The input matrix can be selected from the sparsematrices directory. If the user wants, he can generate its own matrices to tailor structure and nonzero volume to what he thinks it may benefit more the GFLOPS on GPU. To do so, use the HB-libraries directory.
  • All nonzeros contain numerical values 0.999999 to prevent underflow/overflow when operating on themselves.