The problem with incompetence is its inability to recognize itself.
― Orrin Woodward
My wife has a very distinct preference in late night TV shows. First, the show cannot be on late night TV, she is fast asleep by 9:30 most nights. Secondly, she is quite loyal. More than twenty years ago she was essentially forced to watch late night TV while breastfeeding our newborn daughter. Conan O’Brien kept her laughing and smiling through many late night feedings. He isn’t the best late night host, but he is almost certainly the silliest. His shtick is simply stupid with a certain sophisticated spin. One of the dumb bits on his current show is “Why China is kicking our ass”. It features Americans doing all sorts of thoughtless and idiotic things on video with the premise being that our stupidity is the root of any loss of American hegemony. As sad as this might inherently be, the principle is rather broadly applicable and generally right on the money. The loss of preeminence nationally is more due to shear hubris; manifest overconfidence and sprawling incompetence on the part of Americans than anything being done by our competitors.
The conventional view serves to protect us from the painful job of thinking.
High performance computing is no different. By our chosen set of metrics, we are losing to the Chinese rather badly through a series of self-inflicted wounds instead of superior Chinese execution. Nonetheless, we are basically handing the crown of international achievement to them because we have become so incredibly incompetent at intellectual endeavors. Today, I’m going to unveil how we have thoughtlessly and idiotically run our high performance computing programs in a manner that undermines our success. My key point is that stopping the self-inflicted damage is the first step toward success. One must take careful note that the measure of superiority is based on a benchmark that has no practical value. Having metric of success with no practical value is a large part of the underlying problem.
Never attribute to malevolence what is merely due to incompetence
As a starting point I’ll state that current program kicking off, the Exascale Computing Project is a prime example of how we are completely screwing things up. It is basically a lexicon of ignorance and anti-intellectual thought paving the way to international mediocrity. The biggest issue is the lack of intellectual depth in the whole basis of the program, “The USA must have the faster computer”. The fastest computer does not mean anything unless we know how to use it. The fastest computer does not matter if it is fastest at doing meaningless things, or it isn’t fast doing things that are important. The fastest computer is simply a tool in a much larger “ecosystem” of computing. This fastest computer is the modern day equivalent of the “missile gap” from the cold war, which ended up being nothing but a political vehicle.
If part of this ecosystem is unhealthy, the power of the tool is undermined. The extent to which it is undermined should be a matter of vigorous debate. This current program is inadvertently designed to further unbalance an ecosystem that has been under duress for decades. We have been focused on computer hardware for the past quarter of a century while failing to invest in physics, engineering, modeling and mathematics all essential to the utility of the tool of computing. We have starved innovation in the use of computing and the set of most impactful aspects of the computing ecosystem. The result is an intellectually hollow and superficial program that will be a relatively poor investment in terms of benefit to society for dollar spent. In essence the soul of computing is being lost. Our quest for exascale computing belies a program that is utterly and unremittingly hardware focused. This hardware focus is myopic in the extreme and starves the ecosystem of major elements of its health. These elements are the tie to experiments, modeling, numerical methods and solution algorithms. The key to Chinese superiority, or not is whether they are making the same mistakes as we are making. If they are, their “victory” is hollow; if they aren’t their victory will be complete.
If you conform, you miss all of the adventures and stand against the progress of society.
― Debasish Mridha
Scientific computing has been a thing for about 70 years being born during World War 2. During that history there has been a constant push and pull of capability of computers, software, models, mathematics, engineering, method and physics. Experimental work has been essential to keep computations tethered to reality. An advance in one area would spur the advances in another in a flywheel of progress. A faster computer would make new problems previously seeming impossible to solve suddenly tractable. Mathematical rigor may suddenly give people faith in a method that previously seemed ad hoc and unreliable. Physics might ask new questions counter to previous knowledge, or experiments would confirm or invalidate model applicability. The ability to express ideas in software allows algorithms and models to be used that may have been too complex with older software systems. Innovative engineering provides new applications for computing that extend the scope and reach of computing to new areas of societal impact. Every single one of these elements is subdued in the present approach to HPC, and robs the ecosystem of vitality and power. We have learned these lessons in the recent past, yet swiftly forgotten them when composing this new program.
Control leads to compliance; autonomy leads to engagement.
― Daniel H. Pink
This alone could be a recipe for disaster, but it’s the tip of the iceberg. We have been mismanaging and undermining our scientific research in the USA for a generation both at research institutions like Labs and our universities. Our National Laboratories are mere shadows of their former selves. When I look at how I am managed the conclusion is obvious: I am well managed to be compliant to a set of conditions that have nothing to do with succeeding technically. Good management is applied to following rules and basically avoid any obvious “fuck ups”. Good management is not applied to successfully executing a scientific program. This being the prime directive today, the entire scientific enterprise is under siege. The assault on scientific competence is broad-based and pervasive as expertise is viewed with suspicion rather than respect. Part of this problem is the lack of intellectual stewardship reflected in numerous empty thoughtless programs. The second piece is the way we are managing science. A couple of easy things engrained into the way we do things that lead to systematic underachievement is inappropriately applied project planning and intrusive micromanagement into the scientific process. The issue isn’t management per se, but its utterly inappropriate application and priorities that are orthogonal to technical achievement.
One of the key elements in the downfall of American supremacy in HPC is the inability to tolerate failure as a natural outgrowth of any high-end endeavor. Our efforts are simply not allowed to fail at anything lest it be seen as a scandal or waste of money. In the process we deny ourselves the high-risk, but high-payoff activities that yield great leaps forward. Of course a deep-seated fear is at the root of the problem. As a direct result of this attitude, we end up not trying very hard. Failure is the best way to learn anything, and if you aren’t failing, your aren’t learning. Science is nothing more than a giant learning exercise. The lack of failure means that science simply doesn’t get done. All of this is obvious, yet our management of science has driven failure out. It is evident across a huge expanse of scientific endeavors, and HPC is no different. The death of failure is also the death of accomplishment. Correcting this problem alone would allow for significantly greater achievement, yet our current governance attitude seems utterly incapable of making progress here.
Tied like a noose around a neck is the problem of short-term focus. The short-term focus is the twin of the “don’t fail” attitude. We have to produce results and breakthroughs on a quarterly basis. We have virtually no idea where we are going beyond an annual basis, and the long-term plans continually shift with political whims. This short-term myopic view is being driven harder with each passing year. We effectively have no big long-term goals as a nation beyond simple survival. Its like we have forgotten to dream big and produce any sort of inspirational societal goals. Instead we create big soulless programs in the place of big goals. Exascale computing is perfect example. It is a goal without a real connection to anything societally important and is crafted solely for the purpose of getting money. It is absolutely vacuous and anti-intellectual at its core by viewing supercomputing as a hardware-centered enterprise. Then it is being managed like everything else with relentless short-term focus and failure avoidance. Unfortunately, even if it succeeds, we will continue our tumble into mediocrity.
This tumble into mediocrity is fueled by an increasingly compliance oriented attitude toward all work. Instead of working to conduct a balanced and impactful program to drive the capacity of computing to impact the real World, our programs simply comply with the intellectually empty directives from above. There is no debate about how the programs are executed because PI’s and Labs are just interested in getting money. The program is designed to be funded instead of succeed, and the Labs don’t act as honest brokers any longer being primarily interested in filling their own coffers. In other words, the program is designed as a marketing exercise, not a science program. Instead of a flywheel of innovative excellence and progress we produce a downward spiral of compliance driven mediocrity serving intellectually empty and unbalanced goals. If everyone gets their money and can successfully fill out their time sheets and gets a paycheck, it is a success.
At the end of the Cold War in the early 1990’s the USA’s Nuclear Weapons’ Labs were in danger of a funding free fall. Nuclear weapons’ testing ended in 1992 and the prospect of maintaining the nuclear weapons’ stockpile without testing, loomed large. A science-based stockpile stewardship (SBSS) program was devised to serve as a replacement, and HPC was one of the cornerstones of the program. SBSS provided a backstop against financial catastrophe at the Labs and provided long-term funding stability. This HPC element in SBSS was the ASCI program (which became the ASC program as it matured). The original ASCI program was relentlessly hardware focused with lots of computer science, along with activities to port older modeling and simulation codes to the new computers. This should seem very familiar to anyone looking at the new ECP program. The ASCI program is the model for the current exascale program. Within a few years it became clear that ASCI’s emphasis on hardware and computer science was inadequate to provide modeling and simulation support for SBSS with sufficient confidence. Important scientific elements were added to ASCI including algorithm and method development, verification and validation, and physics model development as well as stronger ties to experimental programs. These additions were absolutely essential for success of the program. That being said, these elements are all subcritical in terms of support, but they are much better than nothing.
If one looks at the ECP program the composition and emphasis looks just like the original ASCI program without the changes made shortly into its life. It is clear that the lessons learned by ASCI were ignored or forgotten by the new ECP program. It’s a reasonable conclusion that the main lesson taken from ASC program was how to get money by focusing on hardware. Two issues dominate the analysis of this connection:
- none of the lessons learned by ASC necessary to conduct science have been learned by the exascale program. The exascale program is designed like the original ASCI program and fails to implement any of the programmatic modifications necessary for applied success. It is reasonable to conclude that the program has no serious expectation of applied scientific impact. Of course they won’t say this, but actions do speak louder than words!
- The premise that exascale computing is necessary for science is an a priori assumption that has been challenged repeatedly (see JASONS reviews for example). The unfunded and neglected aspects of modeling, methods and algorithms all provide historically validated means to answer these challenges. Rather than address these challenges, they were rejected out of hand and never technically addressed. We simply see an attitude that bigger is better by definition and its been sold more as a patriotic call to arms than a balanced scientific endeavor. It remains true that faster computers are better, if you do everything right, we are not supporting the activities to do everything right (V&V, experimental connection and model development being primal in this regard).
Beyond the troubling lack of learning from past mistakes other issues remain. Perhaps the most obviously damning aspect of our current programs is their lack of connection to massive national goals. We simply don’t have any large national goals beyond being “great” or being “#1”. The HPC program is a perfect example. The whole program is tied to simply making sure that the USA is #1. In the past when computing came of age, the supercomputer was merely a tool that demonstrated utility in accomplishing something important to the nation or the world. It was not an end unto itself. This assured a definite balance in how the HPC was executed because the success was measured by HPC’s impact on a goal beyond itself. Today there is no goal beyond the HPC and the supercomputing as an activity suffers greatly. It has no measure of success outside itself. Any science done by supercomputer is largely for marketing, and press release. Quite often the results have little or no importance aside from the capacity to generate a flashy picture to impress people who know little or nothing about science.
Taken in sufficient isolation the objectives of the exascale program are laudable. An exascale computer is useful if it can be reasonably used. The issue is that such a computer does not live in isolation; it exists in a complex trade space where other options exist. My premise has never been that better or faster computer hardware is inherently bad. My premise is that the opportunity cost associated with such hardware is too high. The focus on the hardware is starving other activities essential for modeling and simulation success. The goal of producing an exascale computer is not an objective of opportunity, but rather a goal that we should actively divest ourselves of. Gains in supercomputing are overly expensive and work to hamper progress in related areas simply by the implicit tax produced by how difficult the new computers are to use. Improvements in real modeling and simulation capability would be far greater if we invested our efforts in different aspects of the ecosystem.
The key to holding a logical argument or debate is to allow oneself to understand the other person’s argument no matter how divergent their views may seem.
― Auliq Ice