“Theories might inspire you, but experiments will advance you.” ― Amit Kalantri
This week I have a couple of opportunities to speak directly with my upper management. At one level this is nothing more than an enormous pain in the ass, but that is my short sighted monkey-self speaking. I have to prepare two talks and spend time vetting them with others. It is enormously disruptive to getting “work” done.
On the other hand, a lot of my “work” is actually a complete waste of time. Really. Most of what I get paid for is literally a complete waste of a very precious resource, time. So it might be worthwhile making good use of these opportunities. Maybe something can be done to provide work more meaning, or perhaps I need to quit feeling the level of duty to waste my precious time on stupid meaningless stuff some idiot calls work. Most of the time wasting crap is feeding the limitless maw of the bureaucracy that infests our society.
Now we can return to the task at hand. The venues for both engagements are somewhat artificial and neither is ideal, but its what I have to work with. At the same time, it is the chance to say things that might influence change for the better. Making this happen to the extent possible has occupied my thoughts. If I do it well, the whole thing will be worth the hassle. So with hope firmly in my grasp, I’ll charge ahead.
I always believe that things can get better, which could be interpreted as whining, but I prefer to think of this as a combination of the optimism of continuous improvement and the quest for excellence. I firmly believe that actual excellence is something we have a starkly short supply of. Part of the reason is the endless stream of crap that gets in the way of doing things of value. I’m reminded of the phenomenon of “bullshit jobs” that has been recently been observed (http://www.salon.com/2014/06/01/help_us_thomas_piketty_the_1s_sick_and_twisted_new_scheme/). The problem with bullshit jobs is that they have to create more work to keep them in business, and their bullshit creeps into everyone’s life as a result. Thus, we have created a system that works steadfastly to keep excellence at bay. Nonetheless in keeping with this firmly progressive approach, I need to craft a clear narrative arc that points the way to a brighter, productive future.
High performance computing is one clear over-arching aspect of what I work on. Every single project I work on connects to this. The problem is that to a large extent HPC is becoming increasingly disconnected from reality. Originally computing was an important element in various applied programs starting with the Manhattan project. Computing had grown in prominence and capability through the (first) nuclear age in supporting weapons and reactors alike. NASA also relied heavily on contributions from computing, and the impact of computation modeling improved the efficiency of delivery of science and engineering. Throughout this period computing was never the prime focus, but rather a tool for effective delivery of a physical product. In other words there was always something real at stake that was grounded in the physical “real” world. Today, more and more there seems to have been a transition to a World where the computers became the reality.
More and more the lack of support for the next supercomputer is taking on the tone and language of the past, as if we have “supercomputer gap” with other countries. The tone and approach is reminiscent of the “missile gap” of a generation ago, or the “bomber gap” two generations ago. Both of those gaps were BS to a very large degree, and I firmly believe the supercomputer gap is too. These gaps are effective marketing ploys to garner support for building more of our high performance computers. Instead we should focus on the good high performance computing can do for real problem solving capability, and let the computing chips fall where they may.
There is a gap, but it isn’t measured in terms of FLOPS, CPUs, memory, it is measured in terms of our practice. Our supercomputers have lost touch with reality. Supercomputing needs to be connected to a real tangible activity where the modeling assists experiments, observations and design in producing something that services a societal need. These societal needs could be anything from national defense, cyber-security, space exploration, to designing better more fuel-efficient aircraft, or safer more efficient energy production. The reality we are seeing is that each of these has become secondary to the need for the fastest supercomputer.
A problem is that the supercomputing efforts are horribly imbalanced having become primarily a quest for hardware capable of running the LINPAC benchmark the fastest. LINPAC does not reflect the true computational character of the real applications supercomputers use. In many ways it is almost ideally suited towards demonstrating high operation count. Ironically it is nearly optimal in its lack of correspondence to applications. As a result of the dynamic that has emerged is that real application power has become a secondary, optional element in our thinking about supercomputing.
These developments highlight our disconnect from reality. In the past, the reality of the objective was the guiding element in computing. If the computing program got out of balance, reality would intercede to slay any hubris that developed. This formed a virtuous cycle where experimental data would push theory, or computed predictions would drive theorists to explain, or design experiments to provide evidence.
In fact, we have maimed this virtuous cycle by taking reality out of the picture.
The Stockpile Stewardship program was founded as the alternative to the underground testing of nuclear weapons, and supercomputing was its flagship. We even had a certain official say that a computer could be “Nevada* in box” and pushing the return key would be akin to pressing the button on a nuclear test. It was a foolish and offensive thing to say, almost everyone else in the room knew it was; yet this point of view has taken root, and continues to wreck havoc. Then and now, the computer hardware has become nearly to sole motivation with a loss of the purpose for the entire activity far too common. Everything else needed to be successful has been short-changed in the process. With the removal of the fully integrated experiments of the nuclear test from the process, the balance in everything else needed to be carefully guarded. Instead, this balance was undermined almost from the start. We have not put together a computing program with sufficient balance, support and connections to theory and experiment to succeed, as the Country should demand.
“The real world is where the monsters are.” ― Rick Riordan
I have come to understand that there is something essential in building something new. In the nuclear reactor business, the United States continues to operate old reactors, and fails to build new ones. Given the maturity of the technology, the tendency in high performance computing is to allow highly calibrated models to be used. These models are highly focused on working within a parameter space that is well trodden and containing to be the focus. If the United States were building new reactors with new designs the modeling would be taxed by changes in the parameter space. The same is true for nuclear weapons. In the past there were new designs and tests that either confirmed existing models, or yielded a swift kick to the head with an unexplained result. It is the continued existence of the inexplicable that would jar models and modeling out of an intellectual slumber. Without this we push ourselves into realms of unreasonable confidence in our ability to model things. Worse yet we allow ourselves to pile all our uncertainty into calibration, and then declare confidently that we understand the technology.
At the core of the problem is the simple, easy and incorrect view that bigger, faster supercomputers are the key. The key is deep thought and problem solving approach devised by brilliant scientists exercising the full breadth of scientific tools available. The computer in many ways is the least important element in successful stewardship; it is necessary, but woefully insufficient to provide success.
“Never confuse movement with action.” ― Ernest Hemingway
Supercomputing was originally defined as the use of powerful computers to solve problems. Problem solving was the essence of the activity. Today this is only true by fiat. Supercomputing has become almost completely about the machines, and the successful demonstration of the machines power on stunt applications or largely irrelevant benchmarks. Instead of defining the power of computing by problems being solved, the raw power of the computer has become the focus. This has led to a diminishment in the focus on algorithms and methods, which has actually a better track record than Moore’s law for improving computational problem solving capability. The consequence of this misguided focus is a real diminishment in our actual capability to solve problems with supercomputers. In other words, our quest for the fastest computer is ironically undermining our ability to use computers effectively as possible.
The figure below shows how improvements in numerical linear algebra have competed with Moore’s law over a period of nearly forty years. This figure was created in 2004 as part of a DOE study (the Scales workshop URL?). The figure has several distinct problems: the dates are not included, and the algorithm curve is smooth. Adding texture to this is very illuminating because the last big algorithmic breakthrough occurred in the mid 1980’s (twenty years prior to the report). Previous breakthroughs occurred on an even more frequent time scale, 7-10 years. Therefore in 2004 we were already overdue for a new breakthrough, which has not come yet. On the other hand one might conclude that multigrid is the ultimate linear algebra algorithm for computing (I for one don’t believe this). Another meaningful theory might be that our attention was drawn away from improving the fundamental algorithms towards a focus on making these algorithms work on massively parallel supercomputers. Perhaps improving on multigrid is a difficult problem, and the problem might be that we have already snatched all the low hanging fruit. I’d even grudgingly admit that multigrid might be the ultimate linear algebra methods, but my faith is that something better is out there waiting to be discovered. New ideas and differing perspectives are needed to advance. Today, we are a full decade further along without a breakthrough, and even more due for a breakthrough. The problem is that we aren’t thinking along the lines of driving for algorithmic advances.
I believe in progress; I think there are discoveries to be made. The problem is we are putting all of our effort into moving our old algorithms to the new massively parallel computers of the past decade. Part of the reason for this is the increasingly perilous nature of Moore’s law. We have had to increase the level of parallelism in our codes by immense degrees to continue following Moore’s law. Around 2005 the clock speeds in microprocessors stopped their steady climb. For Moore’s law this is the harbinger of doom. The end is near, the combination of microprocessor limits and parallelism limits are conspiring to make computers amazingly power intensive, and the continued rise as in the past cannot continue. At the same time, we are suffering from the failure to continue supporting the improvements in problem solving capability from algorithmic and method investments that had provided more than Moore’s law-worth in increased capability.
A second piece of this figure that is problematic is the smooth curve of advances in algorithm power. This is not how it happens. Algorithms have breakthroughs and in the case of numerical linear algebra it is how the solution time scales with the number of unknowns. This results is quantum leaps in performance when a method allows us to access a new scaling. In between these leaps we have small improvements as the new method is made more efficient or procedural improvements are made. This is characteristically different than Moore’s law in a key way. Moore’s law is akin to a safe bond investment that provides steady returns in a predictable safe manner. Program managers and politicians love this because it is safe whereas algorithmic breakthroughs are like tech stocks; sometimes it pays off hugely, most of the time the return is small. This dynamic is beginning to fall apart; Moore’s law will soon fail (or maybe it won’t).
I might even forecast that the demise of Moore’s law even for a short while might be good for us. Instead of relying on power to grow endlessly, we might have to think a bit harder about how we solve problems. We won’t have an enormously powerful computer that will simply crush problems into submission. This doesn’t happen in reality, but listening to supercomputing proponents you’d think it is common. Did I mention bullshit jobs earlier?
The truth of the matter is that computing might benefit from a discovery that will allow the continuation of the massive progress of the past 70 years. There is no reason to believe that some new technology will bail us out. The deeper issue regards the overall balance of the efforts. The hardware and software technologies have always worked together in a sort of tug-of-war that bares similarity to what we see in tension between theoretical and experimental science. One field drives the other depending on the question and the availability of emergent ideas or technologies that opens new vistas. Insofar as computing is concerned my concern is plain: hardware concerns have had preeminence for twenty or thirty years while algorithmic and method focus has waned. The balance has been severely compromised. Enormous value has been lost to this lack of balance.
This gets to the core of what computing is about. Computing is a tool. It is a different way to solve problems, manage or discover information and communicate. For some computing has become an end unto itself rather than a tool for modern society. We have allowed this perspective to infect scientific computing as a discipline because of the utility of acquiring new supercomputers outweighs using them effectively. This is the root of the problem and the cause of the lack of balance we see at present. This is coupled to a host of other issues in society, not the least of which is a boundless superficiality that drives a short-term focus and disallows real achievement because of the risk of failure has been deemed unacceptable.
We should work steadfastly to restore the necessary balance and perspective for success. We need to allow risk to enter into our research agenda and set more aggressive goals. Requisite with this risk we should provide greater freedom and autonomy to those striving for the goals. Supercomputing should recognize that the core of its utility is computing as a problem solving approach that relies upon computing hardware for success. There is an unfortunate tendency to simply state supercomputing as a national security resource regardless of the actual utility of the computer for problem solving. These claims border on being unethical. We need computers that are primarily designed to solve important problems. Problems don’t become important because a computer can solve them.
* Nevada is the location of the site the United States used for underground nuclear testing.