The input data set
The features of the input matrix, basically number of columns and nonzeros per column, affect the way GPU benefits from parallelization mechanisms, and has to be carefully chosen to optimize performance. One may decide to use any of the matrices available at the Matrix Market repository (Harwell-Boeing sparse matrix collection), or wisely create its own using a sparse matrix generator we also provide along with mytoy.cu.
Some interesting examples we have created with this generator are:
Filename | Matrix rows | Matrix columns | Nonzeros | Workload | Target |
---|---|---|---|---|---|
mat-200-200-27k.rua | 200 | 200 | 27.000 | Baseline | Kepler |
mat-4000-200-540k.rua | 4.000 | 200 | 540.000 | 20 x Baseline | Kepler |
mat-32000-200-4320k.rua | 32.000 | 200 | 4.320.000 | 160 x Baseline | Kepler |
mat-512000-200-69120k.rua | 512.000 | 200 | 69.120.000 | 2560 x Baseline | Kepler |
mat-300-100-22k.rua | 300 | 100 | 22.000 | Baseline | Fermi |
mat-6000-100-440k.rua | 6.000 | 100 | 440.000 | 20 x Baseline | Fermi |
mat-24000-100-1760k.rua | 24.000 | 100 | 1.760.000 | 80 x Baseline | Fermi |
mat-96000-100-7040k.rua | 96.000 | 100 | 7.040.000 | 320 x Baseline | Fermi |
Notes:
- Sparse matrices use Compressed Storage Column (CSC) storage format.
- One CUDA stream is used for each column of the matrix. There are matrices with 100 and 200 columns, but many more rows, and overall, nonzero elements (up to 69.120.000).
- The input matrix can be selected from the sparsematrices directory. If the user wants, he can generate its own matrices to tailor structure and nonzero volume to what he thinks it may benefit more the GFLOPS on GPU. To do so, use the HB-libraries directory.
- All nonzeros contain numerical values 0.999999 to prevent underflow/overflow when operating on themselves.