Your assumptions are your windows on the world. Scrub them off every once in a while, or the light won’t come in.

― Isaac Asimov

If someone gives you some data and asks you to fit a function that “models” the data, many of you know the intuitive answer, “least squares”. This is the obvious, simple choice, and perhaps, not surprisingly, not the best answer. How bad this choice may be depends on the situation? One way to do better is to recognize the situations where the solution via least squares may be problematic, and produce an undue influence on the results.

Most of our assumptions have outlived their uselessness.

― Marshall McLuhan

To say that this problem is really important to the conduct of science is a vast understatement. The reduction of data is quite often posed in terms a simple model (linear terms in important parameters) and solved via least squares. The data is often precious, or very expensive to measure. Given the importance of data in science it is ironic that we should so often take the final hurtle so cavalierly and apply such a crude manner to analyze it as least squares. More properly we don’t consider the consequences of such an important choice, usually it isn’t even thought of as a choice.

That’s the way progress works: the more we build up these vast repertoires of scientific and technological understanding, the more we conceal them.

― Steven Johnson

The key to this is awakening to the assumptions made in least squares. The key assumption is the nature of the assumed errors in fit, which is normally distributed (or Gaussian) statistics for least squares. If you know this to be true then least squares is the right choice. If this is not true, then you might be introducing a rather significant assumption (a known unknown if you will) into your fit. In other words your results will be based upon an assumption you don’t even know that you made.

If your data and model match quite well and the deviations are small, it also may not matter (much). This doesn’t make least squares a good choice, just not a damagingone. If the deviations are large or some of your data might be corrupt (i.e., outliers), the choice of least squares can be catastrophic. The corrupt data may have a completely overwhelming impact on the fit. There are a number of methods for dealing with outliers in least squares, and in my opinion none of them good.

The difficulty lies not so much in developing new ideas as in escaping from old ones.

― John Maynard Keynes

Fortunately there are existing methods that are free from these pathologies. For example the least median deviation fit can deal with corrupt data easily. It naturally excludes outliers from the fit because of a different underlying model. Where least squares are the solution of a minimization problem in the energy or L2 norm, the least median deviation uses the L1 norm. The problem is that the fitting algorithm is inherently nonlinear, and generally not included in most software.

I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.

― Abraham Maslow

One of the problems is that least squares are virtually knee-jerk in its application. It is contained in standard software such as Microsoft Excel and can be applied with almost no thought. If you have to write your own curve-fitting program by far the simplest approach is to use least squares. It can often produce a linear system of equations to solve where alternatives are invariably nonlinear. The key point is to realize that this convenience has a consequence. If your data reduction is important, it might be a good idea to think about what you ought to do a bit more.

Duh.

The explanation requiring the fewest assumptions is most likely to be correct.

― William of Ockham