It is useless to attempt to reason a man out of a thing he was never reasoned into.

― Jonathan Swift

Most of the computer modeling and simulation examples in existence are subject to bias in the solutions. This bias comes from numerical solution, modeling inadequacy, and bad assumptions to name a few of the sources. In contrast uncertainty quantification is usually applied in a statistical and clearly unbiased manner. This is a serious difference in perspective. The differences are clear. With bias the difference between simulation and reality is one sided and the deviation can be cured by calibrating parts of the model to compensate. Unbiased uncertainty is common in measurement error and ends up dominating the approach to UQ in simulations. The result is a mismatch between the dominant mode of uncertainty and how it is modeled. Coming up with a more nuanced and appropriate model that acknowledges and deals with bias appropriately would be great progress.

One of the archetypes of the modern modeling and simulation are climate simulations (and their brethren, weather). These simulations carry with them significant bias associated with lack of computational resolution. The computational mesh is always far too coarse for comfort, and the numerical errors are significant. There are also issues associated with initial conditions, energy balance and representing physics at and below the level of the grid. In both cases the models are invariably calibrated heavily. This calibration compensates for the lack of mesh resolution, lack of knowledge of initial data and physics as well as problems with representing the energy balance essential to the simulation (especially climate). A serious modeling deficiency is the merging of all of these uncertainties into the calibration with an associated loss of information.

We all see only that which we are trained to see.

― Robert Anton Wilson

The issues with calibration are profound. Without calibration the models are effectively useless. For these models to contribute to our societal knowledge and decision-making or raw scientific investigation, the calibration is an absolute necessity. Calibration depends entirely on existing data, and this carries a burden of applicability. How valid is the calibration when the simulation is probing outside the range of the data used to calibrate? We commonly include the intrinsic numerical bias in the calibration, and most commonly a turbulence or mixing model is adjusted to account for the numerical bias. A colleague familiar with ocean models quipped that if the ocean were as viscous as we modeled it, one could drive to London from New York. It is well known that numerical viscosity stabilizes calculation, and we can use numerical methods to model turbulence (implicit large eddy simulation), but this practice should at the very least make people uncomfortable. We are also left with the difficult matter of how to validate models that have been calibrated.

I just touched on large eddy simulation, which is a particularly difficult topic because numerical effects are always in play. The mesh itself is part of the model with classical LES. With implicit LES the numerical method itself provides the physical modeling, or some part of the model. This issue plays out in weather and climate modeling where the mesh is part of the model rather than independent aspect of it. It should surprise no one that LES was born from weather-climate modeling (at the time where the distinction didn’t exist). In other words the chosen mesh and the model are intimately linked. If the mesh is modified, the modeling must also be modified (recalibrated) to get the balancing of the solution correct. This tends to happen in simulations where an intimate balance is essential to the phenomena. In these cases there is a system that in one respect or another is in a nearly equilibrium state, and the deviations from this equilibrium are essential. Aspects of the modeling related to the scales of interest including the grid itself impact the equilibrium to a degree that an un-calibrated model is nearly useless.

If numerical methods are being used correctly and at a resolution where the solution can be considered remotely mesh converged, the numerical error is a pure bias error. A significant problem is the standard approach to solution verification that treats numerical error as unbiased. This is applied in the case where no evidence exists for the error being unbiased! Well-behaved numerical error is intrinsically biased. This is a significant issue because making a biased error, unbiased represents a significant loss of information. Those who either must or do calibrate their models to account for numerical error rarely explicitly estimate numerical error, but account for the bias as a matter of course. Ultimately the failure of the V&V community to correctly apply well-behaved numerical error as a one-sided bias is counter-productive. It is particularly problematic in the endeavor to deal proactively with the issues associated with calibration.

Science is about recognizing patterns. […] Everything depends on the ground rules of the observer: if someone refuses to look at obvious patterns because they consider a pattern should not be there, then they will see nothing but the reflection of their own prejudices.

― Christopher Knight

Let me outline how we should be dealing with well-behaved numerical error below. If one has a quantity of interest where a sequence of meshes produces the monotonic approach to a value (assuming the rest of the model is held fixed) then the error is well behaved. The sequence of solutions on the meshes can then be used to estimate the solution to the mathematical problem, that is the solution where the mesh resolution is infinite (absurd as it might be). Along with this estimate of the “perfect” solution, the error can be estimated for any of the meshes. For this well-behaved case the error is one sided, a bias between the ideal solution and the one with a mesh. Any fuzz in the estimate would be applied to the bias. In other words any uncertainty in the error estimate is centered about the extrapolated “perfect” solution, not the finite grid solutions. The problem with the current accepted methodology is that the error is given as a standard two-sided error bar that is appropriate for statistical errors. In other words we use a two-sided accounting for this error even though there is no evidence for it. This is a problem that should be corrected. I should note that many models (i.e., like climate or weather) invariably recalibrate after all mesh changes, which invalidates the entire verification exercise where the model aside from the grid should be fixed across the mesh sequence.

When we get to the heart of the matter at hand, dealing with uncertainty in calibrated models, we rapidly come to the conclusion that we need to keep two sets of books. If the first thing that comes to mind is, “that’s what criminals do,” you’re on the right track. You should feel uneasy about this conclusion, and we should all feel as sense of disease regarding this outcome. What do we put in these two books? In one case we have calibrated models, and we can rely upon this model to reliably interpolate the data it is calibrated with. So for quantities of interest used to calibrate a model, the model is basically useless, or perhaps it unveils uncertainty and inconsistency within the data used for calibration.

A model is valuable for inferring other things from simulation. It is good for looking at quantities that cannot be measured. In this case the uncertainty must be approached carefully. The uncertainty in these values must almost invariably be larger than the quantities used for calibration. One needs to look at the modeling connections for these values and attack a reasonable approach to treating the quantities with an appropriate “grain of salt”. This includes numerical error, which I talked about above too. In the best case there is data available that was not used to calibrate the model. Maybe these are values that are not as highly prized or as important as those used to calibrate. The uncertainty between these measured data values and the simulation gives very strong indications regarding the uncertainty in the simulation. In other cases some of the data potentially available for calibration has been left out, and can be used for validating the calibrated model. This assumes that the hold-out data is sufficiently independent of the data used.

A truly massive issue with simulations is extrapolation of results beyond the data used for calibration. This is a common and important use of simulations. One should expect the uncertainty to grow substantially with the degree of extrapolation from data. A common and pedestrian source for seeing what this looks like is least square fitting of data. The variation and uncertainty in the calibrated range is the basis of the estimates, but depending on the nature of the calibrated range of the data and the degree of extrapolation, the uncertainty can grow to be very large. This makes perfect reasonable sense, as one departs from our knowledge and experience, we should expect the uncertainty in our knowledge to grow.

A second issue to consider is our second set of books where the calibration is not taken quite so generously. In this case the most honest approach to uncertainty is to apply significant variation to the parameters used to calibrate the model. In addition we should include the numerical error in the uncertainty. In the case of deeply calibrated models these sources of uncertainty can be quite large and generally paint an overly pessimistic picture of the uncertainty. Conversely we have an extremely optimistic picture of uncertainty with calibration. The hope and best possible outcome is that these two views bound reality, and the true uncertainty lies between these extremes. For decision-making using simulation this bounding approach to uncertainty quantification should serve us well.

There are three types of lies — lies, damn lies, and statistics.”

― Benjamin Disraeli