, , , , , , ,

A better question is how do we improve computational simulations most effectively?  Should we focus more on creating better computations instead of faster ones?

On Christmas morning I unwrapped a new iPhone 5S and got rid of my clunky ole iPhone 4S.  Amazingly enough, the Linpac benchmark runs on iPhones with a cool little app (https://itunes.apple.com/us/app/linpack/id380883195 or https://itunes.apple.com/us/app/linpack-benchmark/id390185550?mt=8 or https://play.google.com/store/apps/details?id=com.greenecomputing.linpack&hl=en for the Android).  More amazing than that the clunky iPhone 4 clocks in at around 130 Mflops, which coincidently enough is just about the same as the Cray XMP I used as a brand new professional in 1989.  Moreover, the XMP I had access to in 1989 was one of the most powerful computers in the World.   Yet today I happily chose to recycle my little “XMP” without a second thought.  In 1989 that would have been unthinkable, just as unthinkable as holding that computational horsepower in the palm of my hand!  The new iPhone 5S is just shy of a Gigaflop, and it mostly plays music, and surfs the Internet rather than computing turbulent fluid flows.  What a World we live in!

One of the things that the XMP had was an awesome operating system called CTSS.  In some ways it was a horror show with a flat file system, but in other ways it was a wonder.  It could create something called a “drop file” that saved a complete state of a code that could be picked up by a debugger.  You could change values of variables, or figure out exactly why your calculation had a problem.  Of course this power could be misused, but you had the power.  Soon Cray replaced CTSS with Unicos their version of Unix, and we had a modern hierarchical file system, but lost the power of the drop file.  A lot of computational scientists would enjoy the power of drop files much more than a brand new supercomputer!

What’s the point of this digression?  We have focused on supercomputing as the primary vehicle for progress in computational science for the past 25 years, while not putting nearly so much emphasis on how computations are done.  Computing power comes without asking for it, and yesterday’s supercomputing power provides gaming today, and today’s supercomputing will power the games of tomorrow.  None of this has really changed what we do on supercomputers, and changing what we do on supercomputers has real scientific value and importance.   

The truth of the matter is that the most difficult problems in simulation will not be solved through faster computers alone.  In areas I know a great deal about this is true; direct numerical simulation of turbulence has not yielded understanding, and the challenge of climate modeling is more dependent upon modeling.  Those who claim that a finer mesh will provide clarity have been shown to be overly optimistic.  Some characterized stockpile stewardship as being underground nuclear testing in a “box,” but like the other examples depends on greater acuity in modeling, numerical methods and physical theory. Computational simulation is a holistic undertaking dependent upon all the tools available, not simply the computer.  Likewise, improvement in this endeavor is dependent on all the constituent tools.

Most of the money flowing into scientific computing is focused on making computations faster through providing faster computers.  In my opinion we should be more focused upon improving the calculations themselves.  Improving them includes improving algorithms, methods, efficiency, and models not to mention improved practice in conducting and analyzing computations.  The standard approach to improving computational capability is the development of faster computers.  In fact, developing the fastest computer in the world is a measure of economic and military superiority.  The US government has made the development of the fastest computers a research priority with the exascale program gobbling up resources.  Is this the best way to improve?  I’m fairly sure it isn’t and our over emphasis on speed is extremely suboptimal.

Moore’s law has provided a fifty year glide path for supercomputing to ride.  Supercomputers weathered the storm of the initial generation of commodity-based computing development, and continued to provide the exponential growth in computing power.  The next ten years represents a significant challenge the nature of supercomputing.  Computers are changing dramatically with the fundamental physical limits of current technology hitting limits.  To achieve higher performances levels of parallelism need to grow to unpredicted levels.  Moreover, existing challenges with computer memory, disc access and communication all introduce additional challenges.  The power consumed by computers also poses a difficulty.  All of these factors are conspiring to make the development of supercomputing in the next decade an enormous challenge, and by no means a sure thing. 

I am going to question the default approach.

The signs pointing to the wastefulness of this approach have been with us for a while.  During the last twenty years the actual performance for the bulk of computational simulations has been far below the improvements that Moore’s law would have you believe.  Computational power is measured by the Linpac benchmark, which papers over many of the problems in making “real” applications work on computers.  It solves a seemingly important problem of inverting a matrix using dense linear algebra.  The problem in a nutshell is that dense linear algebra is not terribly important, and makes the computers look a lot better than they actually are.  The actual performance as a proportion of the peak Linpac measured performance has been dropping for decades.  Many practical applications run at much less than 1% of the quoted peak speed.  Everything I mentioned above makes this worse, much worse. 

Part of the problem is that many of methods, and algorithms used on computers are not changing or adapting to reflect the optimality of the new hardware.  In a lot of cases we simply move old codes onto new computers.  The codes run faster, but nowhere as fast as the Linpac benchmark would lead us to believe.  The investment in computer hardware isn’t paying off to the degree that people advertise. 

Computational modeling is extremely important to modern science.  It reflects substantial new capability to the scientific community.  Modeling is a reflection of our understanding of a scientific field.  If we can model something, we tend to understand that thing much better.  Lack of modeling capability usually reflects a gap in our understanding.  Better put, computational modeling is important to the progress of science, and its status reflects the degree of understanding that exists in a given field.  That said, faster computers do not provide any greater understanding in and of themselves.  Period.  Faster, more capable computers allow more complex models to be used, and those more complex models may yield better predictions.  These complex models can be contemplated with better computers, but their development is not spurred by the availability of supercomputing power.     Complex models are the product of physical understanding and algorithmic guile allowing for their solution. 

I am going to suggest that there be a greater focus on the development of better models, algorithms and practice instead of vast resources focused on supercomputers.  The lack of focus on models, algorithms and practice is limiting the effectiveness of computing far more greatly than the power of the computers.  A large part of the issue is the overblown degree of improvement that new supercomputers provide, only a fraction of the reported power.  There is a great deal of potential headroom for greater performance with computers already available and plugged in.  If we can achieve greater efficiency, we can compute much faster without any focus at all on hardware.  Restructuring existing methods or developing new methods with greater accuracy and/or greater data locality and parallelism can gain efficiency.  Compilers are another way to improve code and great strides could be made there to the good of any code using computers.

One of the key areas where supercomputing is designed to make a big impact is direct numerical simulation (DNS), or first principles physical simulation.  These calculations have endless appetites for computing power, but limited utility in solving real problems.  Turbulence, for example, has generally eluded understanding and our knowledge seems to be growing slowly.  DNS is often at the heart of the use case for cutting edge computing.  Given its ability to provide results, the case for supercomputing is weakened.  Perhaps now we ought to focus more greatly on modeling and physical understanding instead of brute force.

Advances in algorithms are another fruitful path for improving results.  Algorithmic advances are systematically under-estimated in terms of their impact.  Several studies have demonstrated that algorithmic improvements have added as much or more to computational power than Moore’s law.  Numerical linear algebra is one area where the case is clear; optimization methods are another.  Numerical discretization approaches may be yet another.  Taken together the gains from algorithms may dwarf those from pure computing power.  Despite this, algorithmic research is conducted as a mere after thought, and more often than not is cut first from a computational science program. 

One of the key issues with algorithmic research is the “quantum” nature of the improvements.  Rather coming is a steady, predictable stream, like Moore’s law, algorithm improvements are more like a phase transition where the performance jumps up changing my an order of magnitude when a break-through is made.  Such breakthroughs are rare and the consequence of many less fruitful research directions.  Once the breakthrough is made the efficiency of the method is improved in a small steady stream, but nothing like the original discovery.  Many examples of these quantum phase transition type of improvements exist: conjugate gradient, multigrid, flux-limited finite differences, artificial viscosity, Karmakar’s method, and others. 

The final area I will touch on is computational practice.  This where things like verification and validation come into the picture.  Modern computational science ought to be about being honest and straightforward about our capability, and V&V is one of the things at the heart of this.  Too often computations are steered into agreement with reality by the heavy hand of calibration.  In fact, calibration is almost always necessary in practice, but the magnitude of its impact is far too infrequently measured.  Even more importantly, the physical nature of the calibration is not identified.  In a crude sense calibration is a picture of our uncertainty.   Too often calibration uses one sort of physics to cover up our lack of knowledge of something else.  My experience has told me to look at turbulence and mixing physics as the first place for calibration to be identified.

If calibration is the public face of uncertainty, what is the truth?  In fact, the truth is hard to find.  Many investigations of uncertainty focus upon the lack of knowledge, which is distinctly different than physical uncertainty.  Lack of knowledge is often explored via parametric uncertainty of the models used to close the physics.  This lack of knowledge studied from parametric uncertainty often does not look like the physical sources of uncertainty, which arise from a lack of knowledge of precise initial conditions that blow up to large scale differences in physical states.  These distinctions loom large in many applications such as climate and weather modeling.  Unraveling the differences between the two types of uncertainty should be one of computational sciences greatest foci because of its distinct policy implications.  It also figures greatly in the determination of the proper placement of future scientific resources. 

Calibration is also used to paper over finite computational resolution.  Many models need to be retuned (i.e., recalibrated) when computational resolution changes.   This effect can easily be measured, but we stick our collective head in the sand.  All one has to do is take a calibrated solution and systematically change the resolution.  Repeatedly, people respond, “I can’t afford a refined calculation!”  Then coarsen the mesh and see how big the changes are.  If you can’t do this, you have big problems, and any predictive capability is highly suspect.  This sort of estimation should provide a very good idea of how much calibration is impacting your solution.   In most big computational studies calibration is important, and unmeasured.  It is time to stop this, and come clean.  Ending this sort of systematic delusion is far more important than buying bigger, faster computers.  In the long run “coming clean” will allow us to improve computational science’s positive impact on society far more than short-term focus on keeping Moore’s law alive.

Computational science isn’t just computers, it is modeling, it is physical theory, it is algorithmic innovation and efficiency, it is mathematics, it is programming languages, programming practice, it is validation against experiments and measurements, it is statistical science, and data analysis.  Computer hardware is only one of the things we should focus on, and that focus shouldn’t choke resources away from things that would actually make a bigger difference in the quality.  Today it does.  A balanced approach would recognize that greater opportunities exist in other aspects of computational science.