The possession of knowledge does not kill the sense of wonder and mystery. There is always more mystery.
― Anaïs Nin
Let’s say you’ve completely bought into my advice and decide to test the hell out of your code. You found some really good problems that “go to eleven.” If you do things right, your code will eventually “break” in some way. The closer you look, the more likely it’ll be broken. Heck your code is probably already broken, and you just don’t know it! Once it’s broken, what should you do? How do you get the code back into working order? What can you do to figure out why it’s broken? How do you live with the knowledge of the limitations of the code? Those limitations are there, but usually you don’t know them very well. Essentially you’ve gone about the process of turning over rocks with your code until you find something awfully dirty-creepy crawly underneath. You then have mystery to solve, and/or ambiguity to live with from the results.
An expert is someone who knows some of the worst mistakes that can be made in his subject, and how to avoid them.
― Werner Heisenberg
I’ll deal with the last question first, how to live with the knowledge of limitations. This is a sort of advice to grow up, and be an adult about things. Any method or code is going to be limited in what it can do. Some of these limitations are imposed by theory, or by practicality, or expense, some of the limitations simply get at the limits of our knowledge and technology today. One of the keys to being an expert in a given field is a deep
understanding of what can and cannot be done, and why. What better way of becoming an expert than purposefully making “mistakes?” Do you understand what the state of the art is? Do you know what is challenging to the state of the art? What are the limits imposed by theory? What do other people do to solve problems and what are the pros and cons to their approach? Exploring these questions, and stepping well outside the confines of comfort provided by success drive a deep knowledge of all of these considerations. When properly applied the art of testing codes delves into a deep knowledge of failures as the fuel for learning and the acquisition of knowledge.
So how might you see your code break? In the worst case the problem might induce an outright instability and the code will blow up. Sometimes the blow up will happen through the production of wild solutions, or even violations of the floating number limits of the computer (NaN’s or not a number will appear in output). Other problems look like corrupted data where the solution doesn’t blow up, but the solution is clearly very wrong. Moving down the chain of bad things we might simply see solutions outside the bounds of what is reasonable or admissible for valid solutions. As we walk through this gallery of bad things each succeeding step is subtler than the last. In our approach to breakage, an analytical solution to a problem can prove invaluable because it provides an unambiguous standard for the solution that can be as accurate as you please.
Next, we simply see solutions that are oscillatory or wiggly. These wiggles can go all the way from dangerous to cosmetic in their character. Sometimes the wiggles might interact with genuinely physical features in a model, and the imperfection is a real modeling problem. Next, we get into the real weeds of solution problems, and start to see failures that can go unnoticed without expert attention. One of the key things is the loss of accuracy in a solution. This could be the numerical level of error being wrong, or the rate of convergence of the solution being outside the theoretical guarantees for the method (convergence rates are a function of the method and the nature of the solution itself). Sometimes this character is associated with an overly dissipative solution where numerical dissipation is too large to be tolerated. At this subtle level we are judging failure by a high standard based on knowledge and expectations driven by deep theoretical understanding. These failings generally indicate you are at a good level of testing and quality.
Once the code is broken in some way, it is time to find out why. The obvious breakage where the solution simply falls apart is the best case to deal with because the failings are so obvious. The first thing you should always do is confirm that you’re solving the problem you think you are, and you’re solving it the way you think you are. This involves examining your input and control of the code to make certain that everything is what you expect it to be. Once you’re sure about this important detail, you can move to the sleuthing. For the obvious code breakdowns you might want to examine how the solution starts to fall apart as early in the process as you can. Is the problem localized near a boundary or a certain feature? Does it happen suddenly? Is there a slow, steady build up toward disaster? The answers all point at different sources for the problem. They tell how and where to look.
One of the key things to understand with any failure is the stability of the code and its methods. You should be intimately familiar with the conditions for stability for the code’s methods. You should assure that the stability conditions are not being exceeded. If a stability condition is missed, or calculated incorrectly, the impact is usually immediate and catastrophic. One way to do this on the cheap is modify the code’s stability condition to a more conservative version usually with a smaller safety factor. If the catastrophic behavior goes away then it points a finger at the stability condition with some certainty. Either the method is wrong, or not coded correctly, or you don’t really understand the stability condition properly. It is important to figure out which of these possibilities you’re subject to. Sometimes this needs to be studied using analytical techniques to examine the stability theoretically.
One of the key things to understand extremely well is the state of the art in a given field. Are there codes and methods that can solve the problem well or without problems? Nothing can replace an excellent working knowledge of what experts in the field are doing. The fastest way to solve a problem is understand and potentially adopt what the best and brightest are already doing. You also have a leg up on understanding what the limits of knowledge and technology are today, and whether you’ve kept up to the boundary of what we know. Maybe it is research to make you code functional, and if you fix it, you might have something publishable! If so, what do they do differently than your code? Can you modify how your code runs to replicate their techniques? If you can do this and reproduce the results that other are getting then you have a blueprint on how to fix your code.
Another approach to take is systematically make the problem you’re solving easier until the results are “correct,” or the catastrophic behavior is replaced with something less odious. An important part of this process is more deeply understand how the problems are being triggered in the code. What sort of condition is being exceeded and how are the methods in the code going south? Is there something explicit that can be done to change the methodology so that this doesn’t happen? Ultimately, the issue is a systematic understanding of how the code and its method’s behave, their strengths and weaknesses. Once the weakness is exposed in the testing can you do something to get rid of it? Whether the weakness is a bug or feature of the code is another question to answer. Through the process of successively harder problems one can make the code better and better until you’re at the limits of knowledge.
The foundation of data gathering is built on asking questions. Never limit the number of hows, whats, wheres, whens, whys and whos, as you are conducting an investigation. A good researcher knows that there will always be more questions than answers.
― Karl Pippart III
Whether you are at the limits of knowledge takes a good deal of experience and study. You need to know the field and your competition quite well. You need to be willing to borrow from others and consider their success carefully. There is little time for pride, if you want to get to the frontier of capability; you need to be brutal and focused along the path. You need to keep pushing your code with harder testing and not be satisfied with the quality. Eventually you will get to problems that cannot be overcome with what people know how to do. At that point your methodology probably needs to evolve a bit. This is really hard work, and prone to risk and failure. For this reason most codes never get to this level of endeavor, its simply too hard on the code developers, and worse on those managing them. Today’s management of science simply doesn’t enable the level of risk and failure necessary to get to the summit of our knowledge. Management wants sure results and cannot deal with ambiguity at all, and striving at the frontier of knowledge is full of it, and usually ends up failing.
The most beautiful experience we can have is the mysterious. It is the fundamental emotion that stands at the cradle of true art and true science.
― Albert Einstein
At the point you meet the frontier, it is time to be willing to experiment with your code (experimentation is great to do even safely within the boundaries of know how). Often the only path forward is changing the way you solve problems. One key is to not undo all the good things you can already do in the process. Quite often one might actually solve the hard problem is some way (like a kludge), only to find out that things that used to be correct and routine for easier problems have been wrecked in the process. That is a back to the drawing board moment! For the very hard problem you may simply be seeking robustness and stability (running the problem to completion), and the measures taken to achieve this do real damage to your bread and butter. You need to be prepared to instrument and study your output in new ways. You are now an explorer, and innovator. Sometimes you need to tackle the problem from a different perspective, challenge your underlying beliefs and philosophies.
The true measure of success is how many times you can bounce back from failure.
― Stephen Richards
At this point its useful to point out that the literature is really bad at documenting what we don’t know. Quite often you are rediscovering something lots of experts already know, but can’t publish. This is one of the worst things about the publishing of research, we really only publish success, and not failure. As a result we have a very poor idea of what we can’t do. It’s only available through inference. Occasionally the state of what can’t be done is published, but usually not. What you may not realize that you are crafting a lens on the problem and a perspective that will shape how you try to solve it. This process is a wonderful learning opportunity and the essence of research. For all these reasons it is very hard and almost entirely unsupportable.
Another big issue is finding general purpose fixes for hard problems. Often the fix to a really difficult problem wrecks your ability to solve lots of other problems. Tailoring the solution to treat the difficulty and not destroy the ability to solve other easier problems is an art and the core of the difficulty in advancing the state of the art. The skill to do this requires fairly deep theoretical knowledge of a field study, along with exquisite understanding of the root of difficulties. The difficulty people don’t talk about is the willingness to attack the edge of knowledge and explicitly admit the limitations of what is currently done. This is an admission of weakness that our system doesn’t support. When fixes aren’t general purpose, one clear sign is a narrow range of applicability. If its not general purpose and makes a mess of existing methodology, you probably don’t really understand what’s going on.
Let’s get to a general theme in fixing problems, add some dissipation to stabilize things and get rid of worrisome features. In the process you often end up destroying the very features of the solution you most want to produce. The key is to identify the bad stuff and keep the good stuff, and this comes from a deep understanding plus some vigorous testing. Dissipation almost always results in a more robust code, but the dissipation needs to be selective, or the solution is arrived at in a wasteful manner. As one goes even deeper into the use of dissipation, the adherence to the second law of thermodynamics rears its head, and defines a tool of immense power if wielded appropriately. A key is to use deep principles to achieve a balanced perspective on dissipation where it is used appropriately in clearly defensible, but limited ways. Even today applying dissipation is still an art, and we struggle to bring more science and principle to its application.
I’ve presented a personally biased view of how to engage in this sort of work. I’m sure other fields will have similar, but different rules for engaging in fixing codes. The important thing is putting simulation codes to the sternest tests they can take, exposing their weaknesses and repairing them. One wants to continually do this until you hit the proverbial wall of our knowledge and ability. Along the way you create a better code, learn the field of endeavor and grow the knowledge and capability of yourself. Eventually the endeavor leads to research and the ability to push the field ahead. This is also the way of creating experts, and masters of a given field. People move from simply being competent practitioners to masters and leaders. This is an unabashed good for everyone, and not nearly encouraged enough. It definitely paves the way forward and produces exceptional results.
A pessimist sees the difficulty in every opportunity; an optimist sees the opportunity in every difficulty.
― Winston S. Churchill