Tuesday, 14 July 2015

Why worry about all sources of errors?

There are many effects that act as sources of errors in a climate data record. No measured value is perfect, whether taken in the laboratory or inferred from radiances measured in low Earth orbit.

The most obvious characteristic to establish about a source of error is the magnitude of its effect. The (standard) uncertainty is a measure of the typical size of errors, and is generally what is represented by "error bars" on a plot.

Less obvious is the need to know whether the error is correlated between different measured values. This becomes very important for climate data records: looking at climate change, highly correlated errors are the ones we need to worry about.

To illustrate this, consider a climate data record (CDR) for sea surface temperature (SST). Full resolution satellite data typically measures an instantaneous SST across a pixel of about 1 km. For a typical case, we might have three categories of effect, causing uncertainties of different size. 
  • Noise, which is uncorrelated between different pixels. 
  • Ambiguity in obtaining the SST from the measured radiances, which tends to be correlated "locally", where the state of the atmosphere is similar.
  • Systematic errors ("biases") including sensor calibration degradation over time, which tend to affect measured values in a highly correlated (non-random) way.
Data producers tend to put effort into correcting any "biases", but nonetheless, residual uncertainty remains after such corrections are applied.

For a single instantaneously measured SST from a well-designed instrument, the first two effects are the biggest, and may be comparable. 

However, for climate change analyses, we may be more interested in how SST over one area and period of time compares with SST for that area at a different time. The larger the spatial scale and longer the period of time considered, the less important noise becomes (the SST errors tend to average down when data are aggregated) and the relatively more important calibration effects become. 

This figure illustrates this effect for a reasonable set of assumptions for the case of SST. In a single 1 km SST retrieval, the random effects may dominate the uncertainty, followed by the locally systematic effects associated with retrieval ambiguity. However, as the scale of analysis of the SST data becomes larger in space and longer in time, first noise and then locally systematic effects become less important. If we are using the CDR across several years, more than one sensor is involved, and not only systematic calibration effects from a single sensor matter, but uncertainty in the (corrected) systematic differences between instruments in the sensor series also begin to matter.

Analyses of uncertainty budgets in SST products and for instrument design often focus on the regime at the left side of this diagram, where calibration effects (after bias correction) may be thought negligible. But CDRs also get used for applications at the right side of this diagram, where the systematic effects matter most. 

When it comes to creating climate data records, all categories of effect causing uncertainty need to be considered and characterised, as far as practicable. All types of error source are relevant to the applications of at least some CDR users.