No.
Scientific computing and high performance computing are virtually synonymous. Should they be? Is this even a discussion worth having?
It should be. It shouldn’t be an article of faith.
I’m going to argue that perhaps they shouldn’t be so completely intertwined. The energy in the computing industry is nearly completely divorced from HPC. HPC is trying to influence computing industry to little avail. In doing so, scientific computing is probably missing opportunities to ride the wave of technology that is transforming society. The societal transformation brings with it economic forces that HPC never had. It is unleashing forces that will have a profound impact on how our society and economy look for decades to come.
Computing is increasingly mobile, and increasingly networked. The access to information and computational power is omniscient in today’s world. It is not an understatement to say that computers and the Internet are reshaping our social, political and scientific worlds. Why shouldn’t scientific computing be similarly reshaped?
HPC is trying to maintain the connection of scientific computing and supercomputing. Increasingly, supercomputing seems passé and a relic of the past, just as mainframes are relics. Once upon a time scientific computing and mainframes dominated the computer industry. Government Labs had the ear of the computing industry, and to a large extent drove the technology. No more. Computing has become a massive element in the World’s economy with science only being a speck on the windshield. The extent to which scientific research is attempting to drive computing is becoming ever more ridiculous, and shortsighted.
At a superficial level all the emphasis on HPC is reasonable, but leads to a group think that is quite damaging in other respects. We expect all of our simulations of the real world get better if we have a bigger, faster computer. In fact for many simulations we have ended up relying upon Moore’s law to do all the heavy lifting. Our simulations just get better because the computer is faster and has more memory. All we have to do is make sure we have a convergent approximation as the basis of the simulation. This entire approach is reasonable, but suffers from intense intellectual laziness.
There I said it. The reliance on Moore’s law is just plain lazy.
Rather than focus on smarter, better, faster solution methods, we just let the computer do all the work. It is lazy. As a result the most common approach is to simply take the old-fashioned computer code and port it to the new computer. Occasionally, this requires us to change the programming model, but the intellectual guts of the program remains fixed. Because consumers of simulations are picky, the sales pitch is simple. “You get the same results, only faster,” “no thinking required!” It is lazy and it serves science, particularly computational science, poorly.
Not only is it lazy, it is inefficient. We are failing to properly invest in advances in algorithms. Study, after study, has shown that the gains from algorithms exceed those of the computers themselves. This is in spite of the relatively high investment in computing compared to algorithms. Think what a systematic investment in better algorithms could do?
It is time for this to end. Moreover there is a very dirty little secret under the hood of our simulation codes. For the greater part, our simulation codes are utilizing an ever-decreasing portion of the potential performance offered by modern computing. This inability to utilize computing is just getting worse and worse. Recently, I was treated to a benchmark of the newest chips, and for the first time the actual runtimes for the codes started to get longer. The new chips won’t even run the code faster, efficiency be damned. A large part of the reason for such poor performance is that we have been immensely lazy in moving simulation forward for the last quarter of a century.
For example, I ran the Linpack benchmark on the laptop I’m writing this on. The laptop is about a generation behind the top of the line, but rates as a 50 GFLOP machine! It is equivalent to the fastest computer in the World 20 years ago; one that cost millions of dollars. My iPad4 is equivalent to Cray-2 (1 GLFOP), and I just use it for email, web-browsing, and note taking. Twenty years ago I would have traded my first born simply to have access to this. Today it sits idle most of the day. We are surrounded by computational power, most of it goes to waste.
The ubiquity of computational power is actually an opportunity to overcome our laziness and start doing something. Most of our codes are using about 1% of the available power. Worse yet, the 1% utility may look fantastic very soon. Back in the days of Crays we could expect to squeeze 25-50% of the power with sufficiently vectorized code. Let’s just say that I could run a code that got 20% of the potential of my laptop, now my 50 GFLOP laptop is acting like a one TeraFLOP computer. No money spent, just working smarter.
Beyond the laziness of just porting old codes with old methods, we also expect the answers to simply get better by having less discrete error (i.e., a finer mesh). This should be true and normally is, but also fails to rely upon the role that a better method can play. Again, the reliance on brute force through a better computer is an aspect of outright intellectual laziness. To get this performance we need to write new algorithms and new implementations. It is not sufficient to simply port the codes. We need to think, we need to ask the users of simulation results to think, and have faith in the ability of the human mind to create new, better solutions to old and new problems. This only applies to the areas of science where computing has been firmly established, there are new areas and opportunities that our intimately connected and computationally rich world have to offer.
These points are just the tip of the proverbial iceberg. The deluge of data and our increasingly networked world offer other opportunities most of which haven’t even been thought of. It is time to put our thinking caps back on. They’ve been gathering dust for too long.
While I agree with the point you’re making, I’d like to
push back slightly. In the last 15 years we haven’t been
entirely lazy, we have improved model fidelity, physics
integration and made 3D practical because of platform
stability and Moore’s law. That golden age is coming to
a close and it is long past the time to get clever about
algorithms again. My concern is that the resources and
patience are lacking for the necessary re-tooling.
Brian, I don’t think anyone has been lazy in the sense that you or we don’t work hard. We all do. We all have been working hard for a long time. Moore’s law has allowed us to accomplish many things exciting, impressive things. Are we working hard at the right things? I think not. I’d strongly argue that we could be so much further if we had a different strategy. In other words, our strategy has been lazy. Too much pure reliance on Moore’s law for improved capability. Not enough challenge on using the computers better.
We should be more aggressive with the things under our control, and less aggressive with those that aren’t. We control our algorithms, and our physical models. We do not, in large part, control the computers and what control we do have is vanishing. I worry that too much of our focus is all about simply moving our codes (algorithms enact) to the new machines. Most of the algorithmic improvement has been in areas that don’t impact the actual quality of the answers, e.g., numerical linear algebra. Too little focus upon improving the code’s answers. Again, it isn’t that we haven’t improved our answers (at constant resolution), but our focus there has been insufficient, and by virtue of this implicitly inefficient.
So what I’m saying that we should have put more emphasis on being clever, ambitious, and reliant on algorithms for all of these years rather than just now. There is a pretty clear case that being clever with algorithms from the beginning of ASC would have put us in an even better place today. Our computers might be a little less massive, but we use the ones we have so much better.
It is possible that with “exascale” or “extreme” computing we will double down on the previous strategy. We are attempting to recreate Moore’s law. It isn’t just ASC, it is the office of science, and climate modeling. All are too dependent on Moore’s law for progress in computing. We leave a huge amount of performance on the table. We are not focused on using these computers efficiently. Our collective creative energies can forge a different path to even greater accomplishments.
Bill, I totally agree. The way I see it, from my limited perspective, is that platform stability and Moore’s law opened up a large amount of unclaimed land in model fidelity and physics integration that we’ve been busily claiming without a whole lot of regard for efficiency. For mission focused efforts, there was (and is) a tremendous amount of pressure to improve our models and extend our application space, so we were well rewarded for our efforts. Our customers were very impatient during the early ASCI days because model fidelity stalled while we moved our old algorithms and models to the parallel architectures. Because we didn’t have the discipline to keep up our algorithm investment (which the customers have a hard time appreciating, takes time to develop, and is generally risky), we are probably going to stall again in improving model fidelity while we play catch-up. When I look back, for my project, we have benefited from key algorithmic investments that have taken years to come to fruition, but that seed corn is largely consumed at this point. We are now being disciplined in an unpleasant way by advanced architectures and tight budgets, while the expectation of expanding of mission space and model fidelity continues unabated. If we had been more balanced as you suggest, we would not be feeling the pinch so acutely now and we’d have a portfolio of alternatives to shift to. Your manifesto needs to be heard so that people understand that the past decade and a half was anomalous (and detrimental in some ways), the same old strategy is not going to work (this is not ASCI reborn) and expectations need to change.
I’ll just say that we need to get “ahead of the curve” instead of behind. 20 years ago we probably did the right thing, today the situation is different. Our strategy is to recreate what happened then, but the situation is much much different. If we continue along the path we will flail against a tidal wave, and end up with damn little to show for it. It is time to try a different path to success. In other words nostalgia is not a program plan, it is a recipe for disaster.