It doesn’t matter how beautiful your theory is … If it doesn’t agree with experiment, it’s wrong.
― Richard Feynman
It is an oft-stated maxim that we should grasp the lowest hanging fruit. In real life this often is hidden in plain sight with modeling and simulation being a prime example in my mind. Even a casual observer could see that the emphasis today is focused on computing speed and power as the path to the future. At the same time one can also see that the push for faster computers is foolhardy and hardly comes at an opportune time. Moore’s law is dead, and may be dead at all scales of computation. It may be the highest hanging fruit pursued at great cost while lower hanging fruits rots away without serious attention, or even conscious neglect. Perhaps nothing typifies this issue more than the state of validation in modeling and simulation.
Validation can be simply stated, but is immensely complex to do correctly. Simply put, validation is the comparison of observations with modeling and simulation results with the intent of understanding the fitness of the model for its intended purpose. More correctly, it is an assessment of modeling correctness, which demands observational data to ground the comparison in reality. It involves deep understanding of experimental and observational science including inherent error and uncertainty. It also involves equally deep understanding of errors and uncertainty of the model. It must be couched in the proper context philosophically including understanding what a model is. Each of these endeavors is in itself a complex and difficult professional activity, and validation is the synthesis of all of it. Being so complex and difficult it is rarely done correctly, and its value is grossly underappreciated. A large part of the reason for this state of affairs is the tendency to completely accept genuinely shoddy validation. I used to give a talk on the validation horrors in the published literature and finding targets for critique basically comes down to looking at almost any paper that does validation. The hard part is finding examples where the validation is done well.
One of the greatest tenets of modeling is the quote by George Box, “all models are wrong, but some are useful.” We have failed to recognize one of the most important, but poorly appreciate maxims of modeling and simulation and corollary to Box’s observation. It is that no amount of computer speed, algorithmic efficiency or accuracy of approximation can make a bad model better. If the model is wrong, solving it faster or more accurately or more efficiently will not improve it. A question that should immediately come to mind “what is useful?” “what is a bad?” and “what is better?” In a deep sense both of these questions are completely answered by a comprehensive validation assessment of the simulation of a model. One needs to define what is bad and what is better. Both concepts depend deeply upon deciding what one wants from a model. What is its point and purpose, and most likely what question is it designed to answer. A question to start things off first understands, “what is a model?”
“What is a model?”
A model is virtually everything associated with a simulation including the code itself, the input to the code, the computer used for the computation, and the analysis of the results. Together all these elements comprise the model. At the core of the model and the code are the theoretical equations being used simulating the real World? More often than not, this is a system of differential equations or something more complex (like integral differential equations for example). These equations are then solved using methods, approximations and algorithms all of which leave their imprint on the results. Putting all of this involves creating a computer code, creating a discrete description of the World and computing that result. Each of these steps constitutes a part of the model. Once the computation has been completed, the results need to analyze and results drawn out of the mountain of numbers produced by the computer. All of these comprise the model we are validating. To separate one thing from another requires good disciplined work and lots of rigor. Usually this discipline is lacking and rigor is replaced by assumptions and slothful practices. In very many cases we are watching willful ignorance in action, or simple negligence. We know how to do validation; we simply don’t demand that people practice it. People are often comforted not knowing and don’t want to actually understand the depth of their structural ignorance.
Science is not about making predictions or performing experiments. Science is about explaining.
― Bill Gaede
Observing and understanding are two different things.
― Mary E. Pearson
To conduct a validation assessment you need observations to compare to. This is an absolute necessity; if you have no observational data, you have no validation. Once the data is at hand, you need to understand how good it is. This means understanding how uncertain the data is. This uncertainty can come from three major aspects of the process: errors in measurement, errors in statistics, and errors in interpretation. In the order of how these were mentioned each of these categories become more difficult to assess and less common to actually be assessed in practice. Most commonly assessed is measurement error that is the uncertainty of the value of a measured quantity. This is a function of the measurement technology or the inference of the quantity from other data. The second aspect is associated with the statistical nature of the measurement. Is the observation or experiment repeatable? If it is not how much might the measured value differ due to changes in the system being observed? How typical are the measured values? In many cases this issue is ignored in a willfully ignorant manner. Finally, the hardest part of observational bias often defined as answering the question, “how do we know that we a measuring what we think we are?” Is there something systematic in our observed system that we have not accounted for that might be changing our observations. This may come from some sort of problem in calibrating measurements, or looking at the observed system in a manner that is inconsistent. These all lead to potential bias and distortion of the measurements.
The intrinsic benefit of this approach is a systematic investigation of the ability of the model to produce the features of reality. Ultimately the model needs to produce the features of reality that we care about, and can measure. This combination is good to balance in the process of validation, the ability to produce the reality necessary to conduct engineering and science, but also general observations. A really good confidence builder is the ability of model to produce proper results on things that we care as well as those don’t care about. One of the core issues is the high probability that many of the things we care about in a model cannot be observed, and the model acts as an inference device for science. In this case the observations act to provide confidence that the model’s inferences can be trusted. One of the keys to the whole enterprise is understanding the uncertainty intrinsic to these inferences, and good validation provides essential information for this.
One of the things few people recognize is the inability of other means to provide remediation from problems with the model. If a model is flawed there is no amount of computer power that can rectify its shortcomings. A computer of infinite speed would (should) only make the problems more apparent. This obvious outcome only becomes available with a complete, rigorous and focused validation of the model. Slipshod validation practices simply allow the wrong model to be propagated without necessary feedback. It is bad science plain and simple. No numerical method or algorithm in the code could provide relief either. The leadership in high performance computing is utterly oblivious to this. As a result almost no effort whatsoever is being put into validation, and models are being propagated forward without any thought regarding their validity. No serious effort exists to put the models to the test either. If our leadership is remotely competent this is an act of willful ignorance, i.e., negligence. While our models today are wonderful in many regards, they are far from perfect (remember what George Box said!). A well-structured scientific and engineering enterprise would make this evident, and employ means to improving them. These new models would open broad new vistas of utility in science and engineering. A lack of recognition of this opportunity makes modeling and simulation self-limiting in its impact.
A prime example where our modeling and simulation are deficient is reproducing the variability seen in the real World. In many cases the experimental practice is equally deficient. For most phenomena of genuine interest and challenge, events and engineered products the exact same response cannot be produced. There are variations in the response because of small differences in the system being studied coming from external conditions (boundary conditions) or the state of system (initial conditions), or simply a degree of heterogeneous character in the system itself. In many cases the degree of variation in response is very large and terribly important. In engineered systems this leads to the application of large and expensive safety factors along with the risk of disaster. This depends to some extent on the nature of the response be sought. The more localized the response, the greater the tendency to be variable, while global-integrated responses can be far more reliably reproduced.
Our scientific and engineering attention is being drawn increasingly to the local responses for significant events, and their importance is growing. These are often worst-case conditions that we strive to avoid. At the same time our models are completely ill suited to address these responses. Our models cannot effectively simulate these sorts of features. Our models are almost without exception focused on a mean-field model producing a model of the average system involving far more homogeneous properties and responses than seen in reality. As such the extremes in response are removed a priori. By the same token our observational and experimental practices are not arrayed to unveil this increasingly essential aspect of reality. The ability of modeling and simulation to impact the real World effectively suffers and its impact is limited by failing to progress.
…if you’re doing an experiment, you should report everything that you think might make it invalid—not only what you think is right about it: other causes that could possibly explain your results; and things you thought of that you’ve eliminated by some other experiment, and how they worked—to make sure the other fellow can tell they have been eliminated.
― Richard Feynman
One of the greatest issues in validation is “negligible” errors and uncertainties. In many cases these errors are negligible by assertion and no evidence is given. A standing suggestion is that any negligible error or uncertainty be given a numerical value along with evidence for that value. If this cannot be done, the assertion is most likely to be specious, or at least poorly thought through. If you know it is small then you should know how small and why. It is more likely is that it is based on some combination of laziness and wishful thinking. In other cases this practice is an act of negligence, and worse yet it is simply willful ignorance on the part of practitioners. This is an equal opportunity issue for computational modeling and experiments. Often (almost always!) numerical errors are completely ignored in validation. The most brazen violators will simply assert without evidence that the errors are small or the calculated is converged without offering any evidence beyond authority.
The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge.
― Daniel J. Boorstin
Similarly, in experiments measurements will be offered without any measurement error, and often no evidence along with an assertion that the error is too small to be concerned about. Experimental or observational results are also highly prone to ignore variability in outcomes and treat each case as a well-determined result even when the physics of the problem is strongly dependent on the details of the initial conditions (or the prevailing models strongly imply this!). Similar sins are committed with modeling uncertainties where an incomplete assessment is made of uncertainty, and no accounting is made of the incompleteness and its impact. To make matters worse other obvious sources of uncertainty are ignored. The result of these patterns of conduct is an almost universal under-estimate of uncertainty from both modeling and observations. This under-estimate results in modeling and simulation being applied in a manner that is non-conservative from a decision-making perspective.
The result of these rather sloppy practices is a severely limited capacity to properly offer an assessment of model validation. Using rather complete uncertainties can produce the sort of result needed to produce definitive results that offer feedback on modeling. If uncertainties can be driven small enough we can drive improvement in the underlying science and engineering. For example, very precise and well-controlled experiments with small uncertainties can produce evidence that models must be improved. Exceptionally small modeling uncertainty could produce a similar effect in pushing experiments. Too often the work is conducted with a strong confirmation bias that takes the possibility of model incorrectness off the table. The result is a stagnant situation where models are not improving and shoddy professional practice is accepted. All of this stems from a lack of understanding or priority for proper validation assessment.
Confidence is ignorance. If you’re feeling cocky, it’s because there’s something you don’t know.
― Eoin Colfer
A mature realization for scientists is that validation is never complete. Models are validated, not codes. The model is a broad set of simulation features, including the model equations, and the code, but also a huge swath of other things. The validation is simply an assessment of all those things. This assessment looks at whether the model and the data are consistent with each other given the uncertainties in each. This assessment is predicated on the completeness of the uncertainty estimation. In the grand scheme of things one wants drive the uncertainties down in either the model or the observations of reality. The big scientific endeavor is locating the source of error in the model; is it in how the model is solved? Or are the model equations flawed? A flawed theoretical model can be a major scientific result requiring a deep theoretical response. Repairing these flaws can open new doors of understanding and drive our knowledge forward in miraculous ways. We need to adopt practices that allow us to identify problems that new models are needed for. The current modeling and simulation practice removes this outcome as a possibility at the outset.
A man is responsible for his ignorance.
― Milan Kundera
Rider, William J. A Rogue’s Gallery of V&V Practice. No. SAND2009-4667C. Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States), 2009.
Rider, William J. What Makes A Calculation Good? Or Bad?. No. SAND2011-7666C. Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States), 2011.
Rider, William J. What is verification and validation (and what is not!)?. No. SAND2010-1954C. Sandia National Laboratories, 2010.