The real zombie-apocalypse is the pandemic of drama and mediocrity.

― Bryant McGill

A while back I referred to supercomputing as being a zombie

(https://wjrider.wordpress.com/2014/10/06/supercomputing-is-a-zombie/). All my experience in the past few months has led me to reconsider this line of thinking, I was wrong. It is much worse than I had ever anticipated. We are continuing to favor computing hardware over more innovative problem solving despite the end of Moore’s law being upon us. The cost is wasted money and significant under-utilization of computing’s benefits.

This morning I awoke thinking about the same thing again, and realized that I missed part of the analogy, not only is the supercomputing emphasis brainless, but it eats our brains too just like a zombie would do. It is rotting the thought out of computational science. The present trends in high performance computing are actually offensively belittling toward the degree to which human ingenuity plays a roll in progress. The program that funds a lot of what I work on, the ASC program, is twenty years old. It was part of a larger American effort toward science based stockpile stewardship envisioned to provide confidence in nuclear weapons when they aren’t being tested.

Orthodoxy means not thinking–not needing to think. Orthodoxy is unconsciousness.

― George Orwell

It now is on verge of being ironic by the name science-based. Science is based on evidence and the current approach to supercomputing is not. It is a faith-based program. The faith is founded primarily on the imminently reasonable prospect that faster, bigger computers bring better solutions to computed modeling and simulation. The whole concept is based on “convergence,” which implies that the computed solution approaches the “true” solution as the amount of computational effort increases. Computational effort is typically associated with a mesh, or grid that defines how the real world is chopped up and represented on the computer.

For example, think about weather or climate modeling and how to improve it. If we model the Earth with a grid of 100 kilometers on a side (so about 25 mesh cells would describe New Mexico), we would assume that a grid of 10 kilometers on a side would be better because it now uses 2500 cells for New Mexico. The problem is that a lot else needs to change too in the model to take advantage of the finer grid such as the way clouds, wind, sunlight, plant life, and a whole bunch of things are represented. This is true much more broadly than just weather or climate, almost every single model that connects a simulation to reality needs to be significantly reworked, as the grid is refined. Right now, insufficient work is being funded to do this. This is a big reason why the benefit of the faster computers is not being realized. There’s more.

The more pieces used to represent the world, the smaller the pieces are and the greater the effort. This is the drive for bigger computers. It’s not nearly so simple, but simplicity is what Americans do best these days. It would be true if we weren’t working toward this end with one hand tied behind our backs (maybe both hands). We have to do more than just make faster computers; we have to think about what we are doing, a lot more. The power of computers needs wisdom that we sorely lack.

There is safety in numbers. And science. Clone your way to being safe. Nobody can protect you like you. And you and you and you.

― Jarod Kintz

More than better models, we can do a better job of solving the balance laws defining the models that are used to connect one grid cell with another. We solve these laws with numerical methods that produce errors in the solution. Better methods produce smaller errors, and beyond that all errors are not equal. Some errors are closer to what is physical, while other errors are decidedly unphysical. Better methods often make errors that are more physical (i.e., numerical diffusion). One of the major problems of the modern supercomputing is the lack of effort to improve the solution of balance laws. We need to create methods with smaller errors, and when errors are made bias them toward physical errors. There’s more.

As we use finer meshes the computer must use more data. The amount of work the computer needs to do to solve a problem is not necessarily proportional to the amount of data; sometimes (most of the time) it takes more work to solve more data than the previous amount. In other words the amount of work grows faster than the data. A typical problem we solve on the computer is the simultaneous solution of linear equations, i.e., linear algebra. The classical way of solving such a problem is Gaussian elimination where the work scales with the cube of the number of equations. Therefore a thousand times larger problem will require a billion times the work to solve.

For special sorts of linear systems associated with balance laws we can do a lot better. This has been a major part of the advance of computing, and the best we can do is for the amount of work to scale exactly like the number of equations (i.e., linearly). As the number of equations grows large the difference between the cube and the linear growth is astounding. This linear algorithm were enabled by multigrid or multilevel algorithms invented by Achi Brandt almost 40 years ago, and coming into widespread use 25 or 30 years ago.

The desire for safety stands against every great and noble enterprise.

― Tacitus

We can’t really do any better today. The efforts of the intervening three decades of supercomputing has focused on making multilevel methods work on modern parallel computers, but no improvement algorithmically. Perhaps linear is the best that can be done although I doubt this. Work with big data is spurring the development of methods that algorithmically scale at less than linear. Perhaps these ideas can improve on multigrid’s performance. The key is would be to allow inventiveness to flourish. In addition risky and speculative work would need to be encouraged instead of the safe and dull work of porting methods to new computers.

As I’ve said before risk avoidance is killing research in many field, scientific computing is no different (https://wjrider.wordpress.com/2014/12/05/is-risk-aversion-killing-innovation/, https://wjrider.wordpress.com/2014/03/03/we-only-fund-low-risk-research-today/, https://wjrider.wordpress.com/2014/12/12/whats-your-backup-plan/). One sign of risk aversion is the inability to start new computer codes, the implementations of the algorithms, methods and models. We continue to work and rework old codes because of the capability they offer compared to a new code. We see these old codes as investments that we must continue to remodel. It’s time to tear them down and put up a fresh structure with new ideas instead of continually putting a fresh coat of paint on the tired old ones.

The potential for good I’ve touched on here is the tip of the iceberg. Algorithms and models can add vastly more value to computational science than faster machines. The only issue is that we aren’t brave enough to take advantage of opportunity. That is the saddest thing about this.

Writing is thinking. To write well is to think clearly. That’s why it’s so hard.

― David McCullough