Any physical theory is always provisional, in the sense that it is only a hypothesis: you can never prove it. No matter how many times the results of experiments agree with some theory, you can never be sure that the next time the result will not contradict the theory.
― Stephen Hawking
Over the past few decades there has been a lot of sturm and drang around the prospect that computation changed science in some fundamental way. The proposition was that computation formed a new way of conducting scientific work to compliment theory, experiment/observation. In essence computation had become the third way for science. I don’t think this proposition stands the test of time and should be rejected. A more proper way to view computation is as a new tool that aids scientists. Traditional computational science is primarily a means of investigating theoretical models of the universe in ways that classical mathematics could not. Today this role is expanding to include augmentation of data acquisition, analysis, and exploration well beyond the capabilities of unaided humans. Computers make for better science, but recognizing that it does change science at all is important to make good decisions.
The key to my rejection of the premise that computation is a close examination of what science is. Science is a systematic endeavor to understand and organize knowledge of the universe in a testable framework. Standard computation is conducted in a systematic manner to conduct studies of the solution to theoretical equations, but the solutions always depend entirely on the theory. Computation also provides more general ways of testing theory and making predictions well beyond the approaches available prior to computation. Computation frees us of limitations for solving the equations comprising the theory, but nothing about the fundamental dynamic in play. The key point is that utilizing computation is as an enhanced tool set to conduct science in an otherwise standard way.
I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.
― Abraham H. Maslow
Why is this discussion worth having now?
Some of the best arguments for the current obsession with exascale computing are couched in advertising computing as a new way of doing science that somehow is game changing. It just isn’t a game changer; computation is an incredible tool that opens new options for progress. Looking at computing, as simply a really powerful tool that enhances standard science just doesn’t sound as good or as compelling for generating money. The problem is that computing is just that, a really useful and powerful tool, and little more. The proper context for computing carries with it important conclusions about how it should be used, and how it should not be used, neither is evident in today’s common rhetoric. As with any tool, computation must be used correctly to yield its full benefits.
This correct use and full benefit is the rub with current computing programs. The current programs focus almost no energy on doing computing correctly. None. They treat computing as a good unto itself rather than treating it as a deep, skillful endeavor that must be completely entrained within the broader scientific themes. Ultimately science is about knowledge and understanding of the World. This can only come from two places: the observation of reality, and theories to explain those observations. We judge theory by how well it predicts what we observe. Computation only serves as a vehicle for more effectively apply theoretical models and/or wrangling our observations practically. Models are still the wellspring of human thought. Computation does little to free us from the necessity for progress to be based on human creativity and inspiration.
Observations still require human ingenuity and innovation to be achieved. This can take the form of the mere inspiration of measuring or observing a certain factor in the World. Another form is the development of measurement devices that allow measurements. Here is a place where computation is playing a greater and greater role. In many cases computation allows the management of mountains of data that are unthinkably large by former standards. Another way of changing data that is either complementary or completely different is analysis. New methods are available to enhance diagnostics or see effects that were previously hidden or invisible. In essence the ability to drag signal from noise and make the unseeable, clear and crisp. All of these uses are profoundly important to science, but it is science that still operates as it did before. We just have better tools to apply to its conduct.
One of the big ways for computation to reflect the proper structure of science is verification and validation (V&V). In a nutshell V&V is the classical scientific method applied to computational modeling and simulation in a structured, disciplined manner. The high performance computing programs being rolled out today ignore verification and validation almost entirely. Science is supposed to arrive via computation as if by magic. If it is present it is an afterthought. The deeper and more pernicious danger is the belief by many that modeling and simulation can produce data of equal (or even greater) validity than nature itself. This is not a recipe for progress, but rather a recipe for disaster. We are priming ourselves for believing some rather dangerous fictions.
This is a healthy attitude expressed by Einstein. Replace theory with computation and ask the same question then inquire whether our attitudes toward models and simulation are equally healthy?
You make experiments and I make theories. Do you know the difference? A theory is something nobody believes, except the person who made it. An experiment is something everybody believes, except the person who made it.
― Albert Einstein
The archetype of this thought process is direct numerical simulation (DNS). DNS is most prominently associated with turbulence, but the mindset presents itself in many fields. The logic behind DNS is the following: if we solve the governing equations without any modeling in a very accurate manner, the solutions are essentially exact. These very accurate and detailed solutions are just as good as measurements of nature. Some would contend that the data from DNS is better because it doesn’t have any measurement error. Many modelers are eager to use DNS data to validate their models, and eagerly await more powerful computers to expand the grasp of DNS to more complex situations. This entire mindset is unscientific and prone to the creation of bullshit. A big part of the problem is a lack of V&V with DNS, but the core issue is deeper. The belief that the equations are exact, not simply accepted models from currently accepted theory.
Let me explain why I would condemn such a potentially useful and powerful activity so strongly. The problem with DNS used in this manner is that it does include a model of reality. Of course this ignores the fact that the equations themselves are a model of reality. The argument behind DNS is that the equations being solved are unquestioned. This lack of questioning is itself unscientific on the face of it, but let me go on. Others will argue that the equations being solved have been formally validated, thus their validity for modeling reality established. Again, this has some truth to it, but the validation is invariably for quantities that may be observed directly, and generally statistically. In this sense the data being used from DNS is validated by inference, but not directly. Using such unvalidated data for modeling is dangerous (it may be useful too, but needs to be taken with a big grain of salt). The use of DNS data needs to exercise caution and be applied in a circumspect manner, not in evidence today.
Perhaps one of the greatest issues with the application of DNS is its failure to utilize V&V systematically. The first leap of faith with DNS believes that no modeling is happening. The equations being solved are not exact, but rather models of reality. Next the error associated with the numerical integration of the equations is rarely (to never) quantified simply assumed to be negligibly small. Even if we were to accept DNS as equivalent to experimental data, the error needs to be defined as part of the data set (in essence the error bar). Other uncertainties almost required for any experimental dataset are also lacking with DNS. The treatment of data from DNS should be higher than any experimental data reflecting the caution such artificial information should be used with. Instead, the DNS computations are treated with less caution. In this way standard practice today veers all the way into cavalier.
The deepest issue with current programs pushing forward on the computing hardware is their balance. The practice of scientific computing requires the interaction and application of great swathes of scientific disciplines. Computing hardware is a small component in the overall scientific enterprise and among the aspect least responsible for the success. The single greatest element in the success of scientific computing is the nature of the models being solved. Nothing else we can focus on has anywhere close to this impact. To put this differently, if a model is incorrect no amount of computer speed, mesh resolution or numerical accuracy can rescue the solution. This is the statement of how scientific theory applies to computation. Even if the model is unyieldingly correct, then the method and approach to solving the model is the next largest aspect in terms of impact. The damning thing about exascale computing is the utter lack of emphasis on either of these activities. Moreover without the application of V&V in a structured, rigorous and systematic manner, these shortcomings will remain unexposed.
In summary, we are left to draw a couple of big conclusions: computation is not a new way to do science, but rather an enabling tool for doing standard science better. If we want to get the most out of computing requires a deep and balanced portfolio of scientific activities. The current drive for performance with computing hardware ignores the most important aspects of the portfolio, if science is indeed the objective. If we want to get the most science out of computation, a vigorous V&V program is one way to inject the scientific method into the work. V&V is the scientific method and gaps in V&V reflect gaps in scientific credibility. Simply recognizing how scientific progress occurs and following that recipe can achieve a similar effect. The lack of scientific vitality in current computing programs is utterly damning.
A computer lets you make more mistakes faster than any other invention with the possible exceptions of handguns and Tequila.
― Mitch Ratcliffe