Action expresses priorities.
― Mahatma Gandhi
Being the successful and competent at high performance computing (HPC) is an essential enabling technology for supporting many scientific, military and industrial activities. It plays an important role in national defense, economics, cyber-everything and a measure of National competence. So it is important. Being the top nation in high performance computers is an important benchmark in defining national power. It does not measure overall success or competence, but rather a component of those things. Success and competence in high performance computing depends on a number of things including physics modeling and experimentation, applied mathematics, many types of engineering including software engineering, and computer hardware. In the list of these things computing hardware is among the least important aspects of competence. It is generally enabling for everything else, but hardly defines competence. In other words, hardware is necessary and far from sufficient.
Claiming that you are what you are not will obscure the strengths you do have while destroying your credibility.
― Tom Hayes
Being a necessity for competence, hardware must receive some support for national success. Being insufficient, it cannot be the only thing supported, and it is not the determining factor for HPC supremacy. In other words, we could have the very best hardware and still be inferior to the competition. Indeed the key to success in HPC has always been a multidisciplinary endeavor and predicated on a high degree of balance across the spectrum of activities needed for competence. If one examines the state of affairs in HPC, we can easily see that all this experience and previous success has been ignored and forgotten. Instead of following a path blazed by previous funding success (i.e., ASCI), we have chosen a road to success solely focused on computing hardware and its direct implications. Worse, the lessons of the past are plain and ignored by the current management. Excellence in other areas has been eschewed in favor of the hardware’s wake. The danger in the current approach is dampening progress in a host of essential disciplines in favor of a success completely dependent on hardware.
The fundamental cause of the trouble is that in the modern world the stupid are cocksure while the intelligent are full of doubt.
― Bertrand Russell
Unfortunately, the situation is far worse than this. If computer hardware was in an era where huge advances in performance were primed to take place, the focus might be forgivable. Instead we are in an era where advances in hardware are incredibly strained. It is easy to see that huge advances in hardware are grinding to a halt, or at least relative to the past half century. The focus of the current programs, the “exascale” initiatives, is actually the opposite. We are attempting to continue growth in computing power at tremendous cost where the very physics of computers is working against us. The focus on hardware is actually completely illogical; if opportunity were the guide hardware would be a side-show instead of the main event. The core of the problem is the complete addiction of the field on Moore’s law for approximately 50 years, and like all addicts, kicking the habit is hard. In a sense under Moore’s law computer performance skyrocketed for free, and people are not ready to see it go.
Most of us spend too much time on what is urgent and not enough time on what is important.
― Stephen R. Covey
Moore’s law is dead and HPC is suffering from the effects of withdrawal. Instead of accepting the death of Moore’s law and shifting the focus to other areas for advancements, we are holding onto it like a junkie’s last fix. In other words, the current programs in HPC are putting an immense amount of focus and resources into keeping Moore’s law alive. It is not unlike the sort of heroic measures taken to extend the life of a terminal patient. Much like the terminal patient whose death is only delayed by the heroic measures, the quality of life is usually terrible. In the same way the performance of HPC is more zombie-like than robust. Achieving the performance comes at the cost of utility and general ease of use for the computers. Moreover the nature of the hardware inhibits advances inother areas due its difficulty of use. This goes above and beyond the vast resource sink the hardware is.
The core truth of HPC is that we’ve been losing this war for twenty years, and the current effort is simply the final apocalyptic battle in war that is about to end. The bottom line is that we are in a terrible place where all progress is threatened by supporting a dying trend that has benefitted HPC for decades.
I work on this program and quietly make all these points. They fall of deaf ears because the people committed to hardware dominate the national and international conversations. Hardware is an easier sell to the political class who are not sophisticated enough to smell the bullshit they are being fed. Hardware has worked to get funding before, so we go back to the well. Hardware advances are easy to understand and sell politically. The more naïve and superficial the argument, the better fit it is for our increasingly elite-unfriendly body politic. All the other things needed for HPC competence and advances are supported largely by pro bono work. They are simply added effort that comes down to doing the right thing. There is a rub that puts all this good faith effort at risk. The balance and all the other work is not a priority or emphasis of the program. Generally it is not important or measured in the success of the program, or defined in the tasking from the funding agencies.
We live in an era where we are driven to be unwaveringly compliant to rules and regulations. In other words you work on what you’re paid to work on, and you’re paid to complete the tasks spelled out in the work orders. As a result all of the things you do out of good faith and responsibility can be viewed as violating these rules. Success might depend doing all of these unfunded and unstated things, but the defined success from the work contracts are missing these elements. As a result the things that need to be done; do not get done. More often than not, you receive little credit or personal success from pursing doing the right thing. You do not get management or institutional support either. Expecting these unprioritized, unintentional things to happen is simply magical thinking.
We have the situation where the priorities of the program are arrayed toward success in a single area that puts other areas needed for success at risk. Management then asks people to do good faith pro bono work to make up the difference. This good faith work violates the letter of the law in compliance toward contracted work. There appears to be no intention of supporting all of the other disciplines needed for success. We rely upon people’s sense of responsibility for closing this gap even when we drive a sense of duty that pushes against doing any extra work. In addition, the hardware focus levies an immense tax on all other work because the hardware is so incredibly user-unfriendly. The bottom line is a systematic abdication of responsibility by those charged with leading our efforts. Moreover we exist within a time and system where grass roots dissent and negative feedback is squashed. Our tepid and incompetent leadership can rest assured that their decisions will not be questioned.
Before getting to my conclusion, one might reasonably ask, “what should we be doing instead?” First we need an HPC program with balance between the impact on reality and the stream of enabling technology. The single most contemptible aspect of current programs is the nature of the hardware focus. The computers we are building are monstrosities, largely unfit for scientific use and vomitously inefficient. They are chasing a meaningless summit of performance measured through an antiquated and empty benchmark. We would be better served through building computers tailored to scientific computation that solve real important problems with efficiency. We should be building computers and software that spur our productivity and are easy to use. Instead we levy an enormous penalty toward any useful application of these machines because of their monstrous nature. A refocus away from the meaningless summit defined by an outdated benchmark could have vast benefits for science.
We could then free up resources to provide a holistic value stream from computing we know by experience. Real applied focusing on modeling and solution methods produces the greatest possible benefit. These immensely valuable activities are completely and utterly unsupported by the current HPC program and paid little more than lip service. Hand-in-hand with the lack of focus on applications and answers is no focus on verification or validation. Verification deals with the overall quality of the calculations, which is just assumed by the magnitude of the calculations (it used so much computer power, it has to be awesome, right?). The lack of validation underpins a generic lack of interest in the quality of the work in terms of real world congruence and impact.
Next down the line of unsupported activities is algorithmic research. The sort of algorithmic research that yields game-changing breakthroughs is unsupported. Algorithmic breakthroughs make the impossible, possible and create capabilities undreamed of. They create a better future we couldn’t even dream of. We are putting no effort into this. Instead we have the new buzzword of “co-design” where we focus on figuring out how to put existing algorithms on the monstrous hardware we are pursuing. The benefits are hardly game changing, but rather simply fighting the tidal wave of entropy of the horrific hardware. Finally we get to the place where funding exists, code development that ports existing models, methods and algorithms onto the hardware. Because little or no effort is put into making this hardware scientifically productive (in fact it’s the opposite), the code can barely be developed and its quality suffers mightily.
A huge tell in the actions of those constructing current HPC programs is their inability to learn from the past (or care about the underlying issues). If one looks at the program for pursuing exascale, it is structured almost identically to the original ASCI program, except being even more relentlessly hardware obsessed. The original ASCI program needed to add significant efforts in support of physical modeling, algorithm research and V&V on top of the hardware focus. This reflected a desire and necessity to produce high quality results with high confidence. All of these elements are conspicuously absent from the current HPC efforts. This sends two clear and unambiguous messages to anyone paying attention. The first message is a steadfast belief that the only quality needed is the knowledge that a really big expensive computer did the calculation at great cost. Somehow the mere utilization of such exotic and expensive hardware will endow the calculations with legitimacy. The second message is that no other advances other than computer power are needed.
The true message is that connection to credibility and physical reality has no importance whatsoever to those running these programs. The actions and focus of the work spelled out plainly in the activities funded makes their plans. The current HPC efforts make no serious attempt to make sure calculations are high quality or impactful in the real world. If the calculations are high quality there will be scant evidence to prove this, and any demonstration will be done via authority. We are at the point where proof is granted by immensely expensive calculations rather then convincing evidence. There will be no focused or funded activity to demonstrate quality. There will be no focused activity to improve the physical, mathematical or algorithmic basis of the codes either. In other words all the application code related work in the program is little more than a giant porting exercise. The priority and intents regarding quality are clear to those of us working on the project, namely quality is not important and not valued.
I’ve been told to assume that the leadership supports the important things to do that are ignored by our current programs. Seeing how our current programs operate, this is hardly plausible. Every single act by the leadership constructs an ever-tightening noose of planning, reporting and constraint about our collective necks. Quality, knowledge and expertise are all seriously devalued in the current era, and we can expect the results to reflect our priorities. We see a system put in place that will punish any attempt to do the right thing. The “right thing” is to do exactly what you’re told to do. Of course, one might argue that the chickens will eventually come home to roost, and the failures of the leadership will be laid bare. I’d like to think this is inevitable, but recent events seem to indicate that all facts are negotiable, and any problems can be spun through innovative marketing and propaganda into success. I have a great deal of faith that the Chinese will mop the floor with us in HPC, and our current leadership should shoulder the blame. I also believe the blame will not fall to the guilty. It never does, today; the innocent will be scapegoated for their mistakes.
Nothing in this World is Static…Everything is Kinetic..
If there is no ‘progression’…there is bound to be ‘regression’…
― Abha Maryada Banerjee
I am left with the feeling that an important opportunity for reshaping the future is being missed. Rather than admit the technological limitations we are laboring under and transform HPC towards a new focus, we continue along a path that appears to be completely nostalgic. The acceptance of the limitations in the growth of computer power in the commercial computing industry led to a wonderful result. Computer hardware shifted to mobile computing and unleashed a level of impact and power far beyond what existing at the turn of the Century. Mobile computing is vastly more important and pervasive than the computing that preceded it. The same sort of innovation could unleash HPC to produce real value far beyond anything conceivable today. Instead we have built a program devoted to nostalgia and largely divorced from objective reality.
Doing better would be simple, at least at a conceptual level. One would need to commit to a balanced program where driving modeling and simulation to impact the real world is a priority. The funded and prioritized activities would need to reflect this focus. Those leading and managing the program would need to ask the right questions and demand progress in the right areas. Success would need to be predicated on the same holistic balanced philosophy. The people working on these programs are smart enough to infer the intent of the programs. This is patently obvious by examining the funding profiles.
Programs are funded around their priorities. The results that matter are connected tomoney. If something is not being paid for it is not important. If one couples steadfast compliance with only working on what you’re funded to do, any call to do the right thing despite funding is simply comical. The right thing becomes complying, and the important thing in this environment is funding the right things. As we work to account for every dime of spending in ever finer increments, the importance of sensible and visionary leadership becomes greater. The very nature of this accounting tsunami is to blunt and deny visionary leadership’s ability to exist. The end result is spending every dime as intended and wasting the vast majority of it on shitty, useless results. Any other outcome in the modern world is implausible.
You never change things by fighting the existing reality.
To change something, build a new model that makes the existing model obsolete.
― R. Buckminster Fuller