Good judgment comes from experience, and experience – well, that comes from poor judgment.

― A.A. Milne

615_Harvard_Geneticist_Professor_ReutersTo avoid the sort of implicit assumption of ZERO uncertainty one can use (expert) judgment to fill in the information gap. This can be accomplished in a distinctly principled fashion and always works better with a basis in evidence. The key is the recognition that we base our uncertainty on a model (a model that is associated with error too). The models are fairly standard and need a certain minimum amount of information to be solvable, and we are always better off with too much information making it effectively over-determined. Here we look at several forms of models that lead to uncertainty estimation including discretization error, and statistical models applicable to epistemic or experimental uncertainty.

Maturity, one discovers, has everything to do with the acceptance of ‘not knowing.

― Mark Z. Danielewski

For discretization error the model is quite simple A = S_k + C h_k^p where A is the mesh converged solution, S_k is the solution on the k mesh and h_k is the mesh length scale, p is the (observed) rate of convergence, and C is a proportionality constant. We have three unknowns so we need at leastCompareRobustAndLeastSquaresRegressionExample_01meshes to solve the error model exactly or more if we solve it in some sort of optimal manner. We recently had a method published that discusses how to include expert judgment in the determination of numerical error and uncertainty using models of this type. This model can be solved along with data using minimization techniques including the expert judgment as constraints on the solution for the unknowns. For both the over- or the under-determined cases different minimizations one can get multiple solutions to the model and robust statistical techniques may be used to find the “best” answers. This means that one needs to resort to more than simple curve fitting, and least squares procedures; one needs to solve a nonlinear problem associated with minimizing the fitting error (i.e., residuals) with respect to other error representations.

For extreme under-determined cases unknown variadjunct-professorables can be completely eliminated by simply choosing the solution based on expert judgment. For numerical error an obvious example is assuming that calculations are converging at an expert-defined rate. Of course the rate assumed needs an adequate justification based on a combination of information associated with the nature of the numerical method and the solution to the problem. A key assumption that often does not hold up is the achievement of the method’s theoretical rate of convergence for realistic problems. In many cases a high-order method will perform at a lower rate of convergence because the problem has a structure with less regularity than necessary for the high-order accuracy. Problems with shocks or other forms of discontinuities will not usually support high-order results and a good operating assumption is a first-order convergence rate.

AOE_headerTo make things concrete let’s tackle a couple of examples of how all of this might work. In the paper published recently we looked at solution verification when people use two meshes instead of the three needed to fully determine the error model. This seems kind of extreme, but in this post the example is the cases where people only use a single mesh. Seemingly we can do nothing at all to estimate uncertainty, but as I explained last week, this is the time to bear down and include an uncertainty because it is the most uncertain situation, and the most important time to assess it. Instead people throw up their hands and do nothing at all, which is the worst thing to do. So we have a single solution S_1 at h_1 and need to add information to allow the solution of our error model, A = S_k + C h_k^p. The simplest way to get to an solvable error model is to simply propose a value for the mesh converged solution, A, which then can be used to provide an uncertainty estimate, F_s    |A – S_1 | multiplied by an appropriate safety factor F_s.

This is a rather strong assumption to make. We might be better served by providing a range values for either the convergence rate of the solution itself. In this way we provide a bit more deference in what we are suggesting as the level of uncertainty, which is definitely called for in this case since we are so information poor. Again the use of an appropriate safety factor is called for, on the order of 2 to 3 in value. From statistical arguments the safety factor of 2 has some merit while 3 is associated with solution verification practice proposed by Roache. All of this is strongly associated with the need to make an estimate in a case where too little work has been done to make a direct estimate. If we are adding information that is weakly related to the actual problem we are solving, the safety factor is essential to account for the lack of knowledge. Furthermore we want to enable the circumstance where more work in active problem solving will allow the uncertainties to be reduced!

1000px-Red_flag_waving.svgA lot of this information is probably good to include as part of the analysis when you have enough information too. The right way to think about this information is as constraints on the solution. If the constraints are active they have been triggered by the analysis and help determine the solution. If the constraints have no effect on the solution then they are proven to be correct given the data. In this way the solution can be shown to be consistent with the views of the expertise. If one is in the circumstance where the expert judgment is completely determining the solution, one should be very wary as this is a big red flag.

Other numerical effects need models for their error and uncertainty too. Linear and nonlinear error plus round-off error all can contribute to the overall uncertainty. A starting point would be the same model as the discretization error, but using the tolerances from the linear or nonlinear solution as h. The starting assumption is often that these are dominated by discretization error, or tied to the discretization. Evidence in support of these assumptions is generally weak to nonexistent. For round-off errors the modeling is similar, but all of these errors can be magnified in the face of instability. A key is to provide some sort of assessment of their aggregate impact on the results and not explicitly ignore them.

Other parts of the uncertainty estimation are much more amenable to statistical structures for uncertainty. This includes the type of uncertainty that too often provides (wrongly!) the entirety of uncertainty estimation, parametric uncertainty. This problem is a direct result of the availability of tools that allow the estimation of parametric uncertainty magnitude. In addition to parametric uncertainty, random aleatory uncertainties, experimental uncertainty and deep model form uncertainty all may be examined using statistical approaches. In many ways the situation is far better than for discretization error, but in other ways the situation more dire. Things are better because statistical models can be evaluated using less data, and errors can be estimated using standard approaches. The situation is dire because often the issues being radically under-sampled are reality, not the model of reality simulations are based on.

Uncertainty is a quality to be cherished, therefore – if not for it, who would dare to undertake anything?

― Villiers de L’Isle-Adam

In the same way as numerical uncertainty, the first thing to decide upon is the model. A The_Thinker,_Auguste_Rodinstandard modeling assumption is the use of the normal or Gaussian distribution as the starting assumption. This is almost always chosen as a default. A reasonable blog post title would be “The default probability distribution is always Gaussian”. A good thing for a distribution is that we can start to assess it beginning with two data points. A bad and common situation is that we only have a single data point. Thus uncertainty estimation is impossible without adding information from somewhere, and an expert judgment is the obvious place to look. With statistical data and its quality we can apply the standard error estimation using the sample size to scale the additional uncertainty driven by poor sampling, 1/\sqrt{N} where N is the number of samples.

There are some simple ideas to apply in the case of the assumed Gaussian and a single data point. A couple of reasonable pieces of information can be added, one being an expert judged standard deviation and then by fiat making the single data point the mean of the distribution. A second assumption could be used where the mean of the distribution is defined by expert judgment, which then defines the standard deviation, \sigma=  |A – A_1| where A is the defined mean, and A_1 is the data point. In these cases the standard error estimate would be equal to \sigma/\sqrt{N} where N=1. Both of these approaches have the strengths and weaknesses, and include the strong assumption of the normal distribution.

In a lot of cases a better simple assumption about the statistical distribution would be to use a uniform distribution. The issue with the uniform distribution would be identifying the width of the distribution. To define the basic distribution you need at least two pieces of information just as the normal (Gaussian) distribution. The subtleties are different and need some discussion. The width of a uniform distribution is defined by A_+A_-. A question is how representative a single piece of information A_1 would actually be? Does one center the distribution about A_1? One could be left with needing to add two pieces of information instead of one by defining A_- and A_+. This then allows a fairly straightforward assessment of the uncertainty.

300px-Comparison_mean_median_mode.svgFor statistical models eventually one might resort to using a Bayesian method to encode the expert judgment in defining a prior distribution. In general terms this might seem to be an absolutely key approach to structure the expert judgment where statistical modeling is called for. The basic form of Bayes theorem is P\left(a|b\right) = P\left(b|a\right) P\left(a\right)/ P\left(b\right) where P\left(a|b\right) is the probability of a given b, P\left(a\right) is the probability of a and so on. A great deal of the power of the method depends on having a good (or expert) handle on all the terms on the right hand side of the equation. Bayes theorem would seem to be an ideal framework for the application of expert judgment through the decision about the nature of the prior.

The mistake is thinking that there can be an antidote to the uncertainty.

― David Levithan

A key to this entire discussion is the need to resist the default uncertainty of ZERO as a principle. It would be best if real problem specific work were conducted to estimate uncertainties, the right calculations, right meshes and right experiments. If one doesn’t have the time, money or willingness, the answer is to call upon experts to fill in the gap using justifiable assumptions and information while taking an appropriate penalty for the lack of effort. This would go a long way to improving the state of practice in computational science, modeling and simulation.

Children must be taught how to think, not what to think.

― Margaret Mead

Rider, William, Walt Witkowski, James R. Kamm, and Tim Wildey. “Robust verification analysis.” Journal of Computational Physics 307 (2016): 146-163.