It feels almost dirty to put a “#” hashtag in my title, but what the hell! The production of predictive models is the holy grail of modeling and simulation. On the other hand we have the situation where a lot of scientists and engineers who think they have predictivity when in fact they have cheated. By “cheating” I usually mean one form or another of calibration either mindfully or ignorantly applied to the model. The model itself ends up being an expensive interpolation and any predictivity is illusory.
When I say that you are modeling the real world, I really mean that you actually understand how well you compare. A model that seems worse, but is honest about your simulation mastery is better than a model that seems to compare better, but is highly calibrated. This seems counter intuitive as I’m say that a greater disparity is better. In the case where you’ve calibrated your agreement, you have lost the knowledge of how well you model anything. Having a good idea of what you don’t know is essential for progress.
A computational model sounds a lot better than an interpolation. In such circumstances simulation ends up being a way of appearing to add more rigor to the prediction when any real rigor was lost in making the simulation agree so well to the data. As long as one is simply interpolating the cost is the major victim of this approach, but in the case where one extrapolates there is danger in the process. In complex problems simulation is almost always extrapolating in some sense. A real driver for this phenomenon is mistakenly high standards for matching experimental data, which drive substantial overfitting of data (in other words forcing a better agreement than the model should allow). In many cases well-intentioned standards of accuracy in simulation drive pervasive calibration that undermines the ability to predict, or assess the quality of any prediction. I’ll explain what I mean by this and lay out what can be proactively done to conduct bonafide modeling.
I suppose the ignorant can be absolved of the sin they don’t realize they are committing. Their main sin is ignorance, which is bad enough. In many cases the ignorance is utterly willful. For example, physicists tend to show a lot of willful ignorance of numerical side effects. They know it exists yet continue to systematically ignore it, or calibrate for its effects. The delusional calibrators are cheating purposefully and then claiming victory despite having gotten the answer by less than noble means. I’ve seen example after example of this in a wide spectrum of technical fields. Quite often nothing bad happens until a surprise leaps up from the data. The extrapolation finally becomes poor and the response of the simulated system surprises.
The more truly ignorant will find that they get the best answer by using a certain numerical method, or grid resolution and with no further justification declare this to be the best solution. This is the case for many, many engineering applications of modeling and simulation. For some people this would mean using a first-order method because it gives a better result than the second-order method. They could find that using a more refined mesh gives a worse answer and then use the coarser grid. This is easier than trying track down why either of these dubious steps would give better answers because they shouldn’t. In other cases, they will find a dubious material or phenomenological model gives better results, or a certain special combination. Even more troubling is the tendency to choose expedient techniques whereby mass, momentum or energy is simply thrown away, or added in response to a bad result. Generally speaking, the ignorant that apply these techniques have no general idea how accurate their model actually is, its uncertainties, or the uncertainties in the quantities they are comparing to.
While dummies abound in science, charlatans are a bigger problem. While calibration when mindfully done and acknowledged is legitimate, the misapplication of calibration as mastery in modeling is rampant. Again, like the ignorant, the calibrators often have no working knowledge of many of innate uncertainties in the model. They will joyfully go about calibrating over numerical error, model form, data uncertainty, and natural variability without a thought. Of course the worst form of this involves ignorant calibrators who believe they have mastery over things they understand poorly. This ultimately is a recipe for disaster, but the near term benefits of these practices are profound. Moreover the powers that be are woefully prepared to unmask these pretenders.
At its worst calibration will utilize unphysical, unrealizable models to navigate the solution into complete agreement with data. I’ve seen examples where fundamental physical properties (like equation of state or cross sections) are made functions of space, when they should be invariant of position. Even worse the agreement will be better than it has a right to be, not even include the possibility that the data being calibrated to is flawed. Other calibrations will fail to account for experimental measurement error, or natural variability and never even raise the question of what these might be. In the final analysis the worst aspect of this entire approach is lost opportunity to examine the state of our knowledge and seek to improve it.
How to do things right:
1. Recognize that the data you are comparing to isn’t accurate, and variable. Try to separate these uncertainties into their sources, measurement error, intrinsic variability, or unknown factors.
2. Your simulation results are similarly uncertain for a variety of reasons. More importantly you should be able to more completely and mindfully examine their sources and estimate their magnitude. Numerical errors arise from finite resolution, uncoverged nonlinearities (the effects of linearization), unconverged linear solvers, and outright bugs. The models often can have their parameters change, or even change to other models. The same can be said of the geometric modeling.
3. Much of the uncertainty in modeling can be explored in a concrete manner by modifying the details of the models in a manner that is physically defensible. The values in or from the model can be changed in ways that can be defended in a strict physical sense.
4. In addition different models are often available for important phenomena and these different approaches can yield a degree of uncertainty. To some degree different computer codes themselves constitute different models and can be used to explore differences in what would be considered reasonable defensible models of reality.
5. A key concept in validation is a hierarchy of experimental investigations that cover different levels of system complexity, and modeling difficulty. These sources of experimental (validation) data provide the ability to deconstruct the phenomena of interest into its constituent pieces and validate them independently. When everything is put together for the full model a fuller appreciation for the validity of the parts can be achieved allowing greater focus on the source of discrepancy.
6. Be ruthless in uncovering what you don’t understand because this will define your theoretical and/or experimental program. If nothing else it will help you mindfully and reasonably calibrate while places limits of extrapolation.
7. If possible work on experiments to help you understand basic things you know poorly and use the results to reduce or remove the scope of calibration.
8. Realize that the numerical solution to your system itself constitutes a model of one sort or another. This model is a function of the grid you use, and the details of the numerical solution.
9. Separate your uncertainties between the things you don’t know and the things that just vary. This is the separation of epistemic and aletory uncertainty. The key to this separation is that that epistemic errors can be removed through learning more. Aletory uncertainty is part of the system that is harder to control.
10. Realize that most physical systems are not completely well determined problems. In other words if you do an experiment that should be the same over and over some of the variation in results is due to imperfect knowledge of the experiment. One should not try to exactly match the results of every experiment individually; some of the variation in results is real physical noise.
11. Put everything else into the calibration, but realize that it is just papering over what you don’t understand. This should provide you with the appropriate level of humility.