Frequently asked questions
QUESTION:
Is there a limit for the number of operations to be performed on each element of the sparse matrix?
ANSWER:
We consider a maximum of 1024 operations. From that number on, hardware performance does not have much room for improvement as long as this would have been minimally tuned following other paths too. Nevertheless, we can perform experiments over this threshold to verify this axiom, and also to complete the chart required by the roofline model. But GFLOPS attained that way will not qualify for the higher scores in our quiz.
QUESTION:
Can I change the numeric expression in the code line which repeatedly iterates over each nonzero element of the sparse matrix?
ANSWER:
It is allowed to change the operator in that code line (add, sub, mul, div, ...), but we CANNOT replace the formula by another one equivalent to it, as the numeric operations are counted "manually" in the expression we use to measure GFLOPS. Let us illustrate this with an example. In the computational loop, the departure point is:
for (int j=0; j<numOperations; ++j)
dvalues[vi] += dvalues[vi];
and, assuming numOperations to be a multiple of 4, the following transformation might be applied:
for (int j=0; j<numOperations/4; ++j)
dvalues[vi] += dvalues[vi]*4;
This produces the same result, but the new program performs 4 times less number of arithmetic operations, and we have not considered that issue in the formula measuring the GFLOPS attained. This is one of many cases where we can find mytoy reporting more GFLOPS that the theoretical limit of the target GPU. Other cases can be found when optimizations are driven by #pragma directives, which enables the compiler to rearrange numeric expressions without applying an equivalent correction in the formula to measure GFLOPS.