From the moment I started graduate school I dealt with legacy code.  I started off by extending the development of a modeling code by my advisor’s previous student.  My hatred of legacy code had begun.  The existing code was poorly written, poorly commented and obtuse.  I probably responded by adding more of the same.  The code was crap so why should I write nice code on top of this basis.  Bad code can help encourage more bad code.  The only positive was contributing to the ultimate death of that code so that no other poor soul would be tortured by developing on top of my work.  Moving to a real professional job only hardened my views; a National Lab is teeming with legacy code.  I soon encountered code that made the legacy code of grad school look polished.  These codes were better documented, but written even more poorly.  Lots of dusty deck Fortran 4 was encountered with memory management techniques associated with computing on CDC supercomputers. Programming devices created at the dawn of computer programming languages were common.  I encountered spaghetti code that would make your head spin.  If I tried to flowchart the method it would look like a Mobius strip.  What dreck!

On the positive side, all this legacy code powered my desire to write better code.  I started to learn about software development and good practices.  I didn’t want to leave people cleaning up my messes and cursing the code I wrote.  They probably did anyway.  The code I wrote was good enough to be reused for purposes I never intended and as far as I know is still in use today.  Nonetheless, legacy code is terrible, and expensive, but necessary.  Replacing legacy code is terrible, expensive and necessary too.  Software is just this way.  There is a deeper problem with legacy code, legacy ideas.  Software is a way of actualizing ideas, algorithms into action.  It is the way that computers can do useful things.  The problem is that the deeper ideas behind the algorithms often get lost in the process. 

Writing code is a manner of concrete problem solving.  Writing a code for general production use is a particularly difficult brand of problem solving because of the human element involved.  The code isn’t just for you to use, but for others to use.   Code should be written for humans, not the computer.  You have to provide them with a tool they can wield.  If the users of a code wield the code successfully, the code begins to take on a character of its own.  If the problem is hard enough and the code is useful enough, the code begins to become legendary. 

A legendary code then becomes a legacy code that must be maintained.  Often the magic that makes it useful is shrouded in the mystery of the techniques for problem solving used.  It becomes a legacy code when the architect who made it useful moves on.  At this point the quality of the code and the clarity of the key ideas becomes paramount in importance.   If the ideas are not clear, the ideas become fixed because subsequent stewards of the capability cannot change them without breaking them.  Too often the “wizards” who developed the code were too busy solving their user’s problems to document what they are doing. 

These codes are a real problem for scientific computing.  They also form the basis of collective achievement and knowledge in many cases, the storehouse of powerful results.  They often become entrenched because they solve important problems for important users, and their legendary capability takes on the air of magic.  I’ve seen this over-and-over and it is a pox on computational science.  It is one of the major reasons that ancient methods for computing solutions continue to be used long after they should have been retired.  For a lot of physics particularly those involved with transport (first order hyperbolic partial differential equations), the numerical method has a large impact on the physical model. 

More properly, the numerical method is part of the model itself, thus the numerical solution and physical modeling are not separable.  Part of the reason is the need to add some sort of stabilization mechanism to the solution (some form of numerical or artificial viscosity).  If the numerical model changes, the related models need to change too.  Any calibrations need to be redone (and there are always calibrations!).  If the existing code is useful there is huge resistance to change because any new method is likely to be worse on the problems that count. Again, I’ve seen this repeatedly over the past 25 years.   The end result is that old legacy codes simply keep going long after their appropriate shelf life. 

Worse yet, a new code can be developed to put the old method into a new code base.  The good thing is that the legacy “code” goes away, but the legacy method remains.  It is sort of like getting a new body and simply moving the soul out of the old body into the new one.  If the method being transferred is well understood and documented, this process has some positives  (i.e., fresh code).  It also represents the loss of the opportunity to refresh the method along with the code.  Since the legacy code started it is likely that the numerical solver technology has improved.  Not improving the solver is a lost opportunity to improve the code.

I am defining the “soul” of the code, the approximations made to the laws of physics.  These are differential equation solvers and the quality of the approximation is one of the most important characteristics of the code.  The nature of the approximations and the errors made therein often define the code’s success.  It really is the soul or personality of the code.  Changes to this part of a successful legacy code are almost impossible.  The more useful or successful the code is the harder such changes are to execute.  I might argue these are the conditions where it is more important to achieve. 

Some algorithms are more of a utility. An example is numerical linear algebra.  Many improvements have taken place with the efficiency that we can solve linear algebra on a computer.  These are important utility that massively impacts the efficiency, but not the solution itself.  We can make the solution on the computer much faster without any effect on the nature of the approximations we make to the laws of physics.  Good software abstracts the interface to these methods so that improvements can be had independent of the core code.  There are fewer impediments to this sort of development because the answer doesn’t change.  If the solution has been highly calibrated and/or highly trusted, getting it faster is naturally accepted.  Too often changes (i.e., improvements) in the solution are not accepted so naturally.

In the minds of many of the users of the code, the legacy code often provides the archetype of what a solution should look like.  This is especially true is the code is used to do useful programmatic work and analyze or engineer important systems.  This mental picture provides an anchor to their computational picture of reality.  Should that picture become too entrenched, the users of the code begin to lose objectivity and the anchor becomes a bias.  This bias can be exceedingly dangerous in that the legacy code’s solutions, errors, imperfections and all become their view of reality.  This view becomes an outright impediment to improving on the legacy code’s results.  It should be a maxim that results can always be improved; the model and method in the code are imperfect reflections of nature and should always be subject to improvements.  These improvements can happen via direct focused research, or the serendipitous application of research from other sources.  Far too often the legacy code acts to suffocate research and stifle creativity because of assumptions made in its creation both implicit and explicit.

One key concept with Legacy codes is technical debt.  Technical debt is an accumulation issues that have been solved in a quick and dirty manner rather than systematically.  If the legacy codes are full of methods that are not well understood, technical debt will accumulate and begin to dominate the development.  A related concept is technical inflation where basic technology passes what is implemented in a code.  Most often this term is applied to aspects of computer science.  In reality technical inflation may also apply to the basic numerical methods in the legacy code.  If the code has insufficient flexibility, the numerical methods become fixed, and rapidly lose any state-of-the-art character (if it even had it to begin with!).  Time only increases the distance between the code and the best available methods.  The lack of connectivity ultimately short-circuits the ability of the methods in the legacy code to influence the development of better methods.  All of these factors conspire to accelerate the rate of technical inflation.

In circumstances where the legacy “code” is replaced, but the legacy methodology is retained (i.e., a fresh code base).  The presence of the intellectual legacy can strangle innovation.  If the fresh code is a starting point for real extensions from the foundational methods and not overly constrained to the past, progress can be had.  This sort of endeavor must be entered into carefully with a well thought-through plan.  Too often this is not the approach, and legacy methods are promulgated forward without a genuine change.  With each passing year the intellectual basis that the methodology was grounded upon ages and understanding is lost.  Technical inflation sets in and the ability to close the gap recedes.  In many cases the code developers will lose sight of what is going on in the research community as it becomes increasingly irrelevant to them.  Eventually, the technical inflation becomes a cultural barrier that will threaten the code.  The results obtained with the code cease to be scientific, and the code developers become curators or priests. They are paying homage to the achievements of the past, and sacrificing their careers at the altar of expediency.  The original developers of the methodology move from legendary to mythic status and all perspective is lost.  The users of the code become a cult. 

Believe me, I’ve seen this in action.  It isn’t pretty.  Solving the inherent problems at this stage require the sorts of interventions that technical people suck at.  

Depending on the underlying culture of the organization using and/or developing the code, the cult can revolve around different things.  At Los Alamos, it is a cult of physicists with numerical methods, software and engineering slighted in importance.  At Sandia, it is engineering that defines the cult.  Engineers are better at software engineering too, so that gets more priority.  The numerical methods and the underlying models are slighted.  In the nuclear industry, legacy code and methods are rampant, with all things bowing to the cult of nuclear regulation.  This regulation is supposed to provide safety, but I fear the actual impact is to squash debate and attention to any details other than the regulatory demands.  This might be the most troubling cult I’ve seen.  It avoids any real deep thought and enshrines legacy code as the core of a legally mandated cult of calibration.  This calibration is papering over a deep lack of understanding and leads to over-confidence or over-spending, probably both. The calibration is deeply entrenched into their problem solving approach that they have no real idea how well the actual systems are being modeled.  Understanding is not even on the radar.  I’ve seen talented and thoughtful engineers self-limit their approach to problem solving because of the sort of fear the regulatory environment brings.  Instead of bringing their “A” game, the regulation induces a thought paralyzing fear. 

The way to avoid the issues is avoid using legacy code and/or methods that are poorly understood.  Important application results should not be dependent on things you do not understand.  Codes are holistic things.  The quality of results depends on many things and people tend to focus on single aspects of the code usually in a completely self-absorbed manner.  Code users think that their efforts are the core of quality, which lends itself to justifying crude calibrations.  People developing closure models tend to focus on their efforts and believe that their impact is paramount.  Method developers focus on the impact of the methods.  The code developer thinks about the issues related to the quality of the code and its impact.  With regulatory factors all independent thought is destroyed.  The fact is that all of these things are intertwined.  It is the nature of a problem that is not separable and must be solved in a unified fashion.  Every single aspect of the code from its core methods, to the models it contains to the manner of its use must be considered in providing quality results.

Advertisements