What’s measured improves

– Peter Drucker

I’ve been doing a fair amount of thinking about metrics for computer performance. maestro_scalingThe question is of supreme importance moving forward with supercomputing. The Drucker quote fairly well captures both the importance of answering the metrics question moving forward, but also consideration of what happened in the recent path. I’ve come to the conclusion that the devotion to “weak scaling” is probably doing a lot of damage to high performance computing.

What is weak scaling? From http://en.wikipedia.org/wiki/Scalability

  • The first is strong scaling, which is defined as how the solution time varies with the number of processors for a fixed total problem size.

  • The second is weak scaling, which is defined as how the solution time varies with the number of processors for a fixed problem size per processor.

The problem is that weak scaling has allowed a number of issues to be papered over. The biggest of these issues is the continuing diminishing performance on node, or the ability to efficiently use the full computing capacity of the machines. We get “better” by using more nodes, but the nodes are used very poorly. A big issue is the memory bandwidth and cost. Basically the processors are starved for data and useful work to do.

The other thing that we are missing is algorithmic development. The success given to weak scaling has starved the development of better algorithms. We are spending a lot of effort implementing old algorithms to achieve weak scaling. Algorithmic scaling is actually much harder, and given the success with which a weak scaling is greeted, the emphasis is chosen by fiat. The fact is that implementing an old algorithm on a new computer is low risk compared to developing new algorithms. Given our propensity toward low risk (i.e., risk management), we get today’s situation; computational science in the doldrums and a lack of real innovation and progress.

It all comes down to what we measure. As long as weak scaling is deemed a success we can continue to ignore the issue. We can hope that the issue will be ultimately put squarely in the frame by the issues of power consumption, which is creating issues that can’t be ignored (the electric bill).

I think the real metric should be something like time or cost to produce a solution of a given, credible quality. It should include wall clock time and the cost of labor. Finally we should consider the costs associated with the lack of basic vitality of the community. We could stand to have some significantly greater inspiration.