The excuses for not doing verification of numerical solutions are myriad.  One of the best although it is unstated, is that verification just doesn’t work all the time.  The results are not “robust” even though the scientist “knows” the results are OK.  A couple things happen to produce this outcome:

  1. The results are actually not OK,
  2. The mesh isn’t in the “asymptotic” range of convergence
  3. The analysis of verification is susceptible to numerical problems.

I’ll comment on each of these in turn and suggest some changes to mindset and analysis approach.

First, I’d like to rant about a few more pernicious ways people avoid verification.  These are especially bad because they often think they are doing it.  For example, let’s say you have an adaptive mesh code (this is done without AMR, but more common with AMR because changing resolution is often so easy).   You get solutions at several resolutions, and you “eyeball” the solutions.  The quantity you or your customer cares about isn’t changing much as you refine the mesh.  You declare victory.

What have you done? It is not verification, it is mesh sensitivity.  You could actually have a convergent result, but you actually don’t have proof.  The solution could be staying the same because it isn’t changing, and is in fact, mesh-insensitive.  What would verification bring to the table?  It would provide a convergence rate, and an error estimate.  In fact, error estimates are the true core of verification.  The error estimate is a much stronger statement about the influence of mesh on your solution, and you almost never see it.

Why? Quite often the act of computing the error estimate actually undermines your faith in the seemingly wonderful calculation and leaves you with questions you can’t answer.  It is much easier to simply exist in comfortable ignorance and believe that your calculations are awesome.  This state of ignorance is the most common way that people almost do verification of calculations, but fail at the final hurtle.  Even at the highest level of achievement in computational science, the last two paragraphs describe the state of affairs.  I think its rather pathetic.

OK, rant off.

Let’s get the zeroth point out of the way first.  Verification is first and foremost about estimating numerical errors.  This counters the oft-stated purpose associated with “order” verification where the rate of convergence for a numerical method is computed.  Order verification is used exclusively with code verification where a solution is known to be sufficiently differentiable to provide proof that a method achieves the right rate of convergence as a function of the mesh parameters.  It is an essential part of the verification repertoire, and it squashes a more important reason to verify, error quantification whether true or estimated.

The first point is the reason for doing verification in the first place.   You want to make sure that you understand how large the impact of numerical error is on your numerical solution.  If the numerical errors are large they can overwhelm the modeling you are interested in doing.  If the numerical errors grow as a function of the mesh parameters, something is wrong.  It could be the code, or it could be the model, or the mesh, but something is amiss and the solution isn’t trustworthy.  If it isn’t checked, you don’t know.

The second point is much more subtle.  So let’s get the elephant in the room identified, meshes used in practical numerical calculations are almost never asymptotic.  This is true even in the case of what is generously called “direct numerical simulation (DNS)” where it is claimed that the numerical effects are small.  Rarely is there an error estimate in sight.  I’ve actually looked into this and the errors are much larger than scientists would have you believe, and the rates of convergence are clearly not asymptotic.

All of this is bad enough, but there is more and it is not easy to understand.  Unless the mesh parameters are small the rate of convergence should systematically deviate from the theoretical rate in a non-trivial way.  Depending on the size of the parameter and the nature of the equation being solved, the correct convergence rate could be smaller or larger than expected.  All of this can be analyzed for ideal equations such as a linear ordinary differential equation.  Depending on the details of the ODE method, and the solution one can get radically different rates of convergence.

The third point is the analysis of numerical solutions.  Usually we just take our sequence of solution and apply standard regression to solve for the convergence rate and estimated converged solution.  This simple approach is the heart of many unstated assumptions that we shouldn’t be making without at least thinking about them.   Standard least squares relies a strong assumption about the data and its errors to begin with.  It assumes that the errors from the regression are normally distributed (i.e., Gaussian).  Very little about numerical error leads one to believe this is true.  Perhaps in the case where the errors are dominated by dissipative dynamics a Gaussian would be plausible, but again this analysis itself only holds in the limit where the mesh is asymptotic.  If one is granted the luxury of analyzing such data, the analysis methodology, frankly, matters little.

What would I suggest as an alternative?

One of the problems that plagues verification is bogus results associated with either bad data, changes in convergence behavior, or outright failure of the (nonlinear) regression.  Any of these should be treated as an outlier and disregarded.  Most common outlier analysis itself relies on the assumption of Gaussian statistics.   Again, making use of this assumption is unwarranted.  Standard statistics using the mean, and standard deviation is the same thing.  Instead one should use median statistics, which can withstand the presence of up to half the data being outliers without problems.  This is the definition of robust and this is what you should use.

Do not use a single regression to analyze data, but instead do many regressions using different formulations of the regression problem, and apply constraints to the solution using your knowledge.  If you have the luxury of many mesh solutions run the regression over various subsets of your data.  For example, you know that a certain quantity is positive, or better yet must take on values between certain limits.  The same applies to convergence rates, you generally have an idea what would be reasonable from analysis; i.e., a first-order method might converge at a rate between one-half and two.   Use these constraints to make your regression fits better and more guaranteed to produce results you can use.  Make sure you throw out results that show that your second-order method is producing 25th order convergence.  This is simply numerical garbage and there is no excuse.

At the end of this you will have a set of numerical estimates for the error and convergence rate.  Use median statistics to choose the best result and the variation from the best so that outliers along the way are disregarded naturally.

Verification should (almost) always produce a useful result and injecting ideas from robust statistics can do this.

Going beyond this point leads us to some really beautiful mathematics that is hot property now (L1 norms leading to compressed sensing, robust statistics, …).   The key is not using the standard statistical toolbox without at least thinking about it and justifying its use.  Generally in verification work it is not justifiable.  For general use a constrained L1 regression would be the starting point, maybe it will be more generally available soon.

Advertisements