Friday, 31 October 2014

OSTIA water/land mask and new inland water bodies dataset

Laura Carrea and I have been looking at the OSTIA water/land mask in comparison with the new higher resolution dataset from the Landcover CCI project. The OSTIA mask is at 0.05 deg lat-lon resolution, and is used for the SST CCI Analysis product (gap filled daily SSTs); its precise heritage (since being created several years ago) hasn't been re-traceable, so it is useful to check its nature.

The plot below shows for a small area (Baltic Sea) the prevalence of water (according to the LC CCI product) within each 0.05 deg cell which is labelled in OSTIA as land. Red colours indicate the presence of a small fraction of water in an OSTIA "land" cell, and rivers and the many lakes on land are obvious. Dark blues indicate total or near-total water in an OSTIA "land" cell. Some of these are inland lakes no resolved in OSTIA, which is no surprise. There are a few cells around the Baltic coasts that are apparently water-filled (dark blue) yet are "land" cells in the OSTIA mask. Globally, however, such cases are exceptions. In general the coasts don't have a fringe, which suggests that the OSTIA mask is designed such that a cell tends to be labelled "water" if there is any significant fraction of sea within the cell. (If the design were such that a cell was labelled "water" if it contains >50% of sea, then there would be a fringe of intermediate values of %water-in-land-cell all coastlines.) So OSTIA must use a "fat" water mask, rather than a mask that is neutral with respect to land and water.




This next plot shows the prevalence of land in cells labelled water in OSTIA. Here, blue colours indicate a small amount of land, and red a lot of land in the cell. The fact that the whole coastline tends to be fringed with colour confirms that the OSTIA mask is "fat" with respect to water. (That is, truly mixed cells tend to be labelled as sea, so there is a fringe of land-in-water-cell cases along coasts.) However, it is also clear that there are many "water" cells that are in LC CCI completely land (dark red in this picture). These appear preferentially on northern coasts. This is consistent globally, and not just around the UK. This suggests that there is an offset in the N-S direction, roughly half an OSTIA cell in size, in the OSTIA mask relative to the Landcover CCI data set.



Where an OSTIA cell is flagged as sea and is in fact filled with land, there will be (or should be) no satellite SST retrievals ever available for that cell -- the data will always be provided by the gap-filling procedures associated with creating the L4 SST CCI analysis.


Friday, 10 October 2014

How to represent different SSTs in the products

As previously discussed we aim in future products to include not only the skin SST (the primary geophysical retrieval) and, as before, a 20 cm estimate at a fixed local time (to minimise aliasing of the diurnal cycle in the long term trends), but also a UTC-day mean estimate. This combination hits a good fraction of the diverse range of user requirements we collected for depths and time.

We also need to provide an adjustment to the most consistent possible retrieval (default in product is the best available type of retrieval, but one might also want to analyse one type of retrieval through the whole record).

To deliver this information in a GHRSST-compatible form requires some thought, since the retrieval and the various model-derived adjustments all have up to three components of uncertainty as well as their values.

Project team discussions have concluded on the following: to store the best available skin SST as the primary variable, and give a set of adjustments that can be added to this, each with N (1 <= N <= 3) uncertainty components. This is a much smaller data volume than adding all the different SSTs each with three uncertainty components. Data volume is a concern for a significant set of users.

For the convenience of users faced with the complexity of adding adjustments to the primary data, we will need to provide a reader programme that configures the calculation of a desired SST type and its uncertainty. Even nicer would be configuration of the desired fields on the fly on download -- that is a technological solution we aim to discuss with those who will do the CCI programme data portal (invitation to tender currently published).

Friday, 3 October 2014

Uncertainty information in Climate Data Records

We are well into our planning for a User Consultation workshop on representing uncertainty information in our SST CCI products in the most useful way. (Registration is still open.)

Venue: Met Office Hadley Centre, Exeter, UK
Date: 18-20th November 2014.

This workshop will be a two-way discussion between data providers and users. We aim to create a common understanding of: where uncertainty comes from (in this case, uncertainty in satellite sea surface temperature); how to talk about uncertainty unambiguously; how well the uncertainty information that is provided addresses users´ needs; and how to practically use such uncertainty information. It will achieve this through a mixture of oral and poster presentations, activities and group discussions.

We as data producers need to provide uncertainty information that users have confidence in, and have confidence in using. That is, they need to know it is realistic information, and what they can validly do with the information. Achieving this definitely involves increased mutual understanding, so it should be a very stimulating and lively meeting.

Friday, 26 September 2014

Learning Python

After years of relying on IDL for interacting with data, I am taking the plunge and switching to Python. The tipping point was deciding that iPython notebooks are a good way of maintaining the links between results, figures and the code used to generate them. My first plot is based on SST CCI data, of course!

Wednesday, 17 September 2014

Geoscience Data Journal paper

An open-access journal article describing the SST CCI phase 1 datasets was published today.

It is published in Geoscience Data Journal. I think the advent of 'data journals' over the past few years is a good development. The traditional recourse of trying to shoe-horn a detailed data description into a paper with science results was not ideal, particularly for large complex datasets such as those created by reprocessing EO archives for climate.

The new paper is:


Merchant, C. J., Embury, O., Roberts-Jones, J., Fiedler, E., Bulgin, C. E., Corlett, G. K., Good, S., McLaren, A., Rayner, N., Morak-Bozzo, S. and Donlon, C. (2014), Sea surface temperature datasets for climate applications from Phase 1 of the European Space Agency Climate Change Initiative (SST CCI). Geoscience Data Journal. doi: 10.1002/gdj3.20
  1. European Space Agency, ESRIN/Contract No. 4000101570/10/I-AM ‘Phase 1 of the ESA Climate Change Initiative SST_cci’

Wednesday, 3 September 2014

What differences to use in validation?

Validation is the comparison of (in this case) our satellite SSTs with temperature measured in situ, from buoys, ships, etc. Validation gives assurance that the satellite SSTs are, in a general sense, accurate. However, the comparison is complicated by the fact that different SSTs are genuinely different (geophysical differences), so that the difference between any two data points is a mix of error contributions and true differences. In addition, the SST CCI products include a number of SST estimates, each of which requires validation.

We therefore need to be clear about which in-situ/satellite comparisons will be made, and why. This post records the results of a review of our options, following discussions between myself and Gary Corlett (Leicester).

"Raw differences"

Here, we will compare the skin SST from the satellite to the nearest-in-time depth SST of the in situ measurement. In this case we expect certain systematic differences. (1) There is a geophysical difference based on the ocean thermal skin effect, which is typically of order -0.2 K, but also has a wind-speed dependence which should be clear in night-time differences. (2) There should be a trend in the raw difference with respect to the time separation of in situ and satellite: for example, in mid morning the ocean is typically warming, so in situ measurements after the satellite time will tend to be warmer. However, with respect to things that might affect the satellite retrieval adversely (but not directly the skin-depth SST difference) the systematic dependencies should be small; for example, a systematic effect in the raw differences with latitude should be no larger than we might be able to account for by the fact that mean wind speed (and therefore skin effect) varies with latitude.

"Skin-skin differences"

The idea here is to estimate the skin and depth effects at the time of the satellite observation and add these to the in situ observation. The in situ history is first interpolated to the satellite observation time, so giving an estimate of SST-20cm at the location and time of a satellite SST. Any 20-cm-to-subskin stratification is estimated using a model (usually small, and only ever large for day-time cases) and the skin effect is also estimated using a skin-effect model. There should ideally be no systematic effects with respect to latitude, wind, satellite-buoy-time-difference, etc, because the models are meant to account correctly (on average) for all the geophysical differences (other than those from comparing a point to a pixel, which are assumed to add zero-mean noise). This measure therefore tests the combination of "retrieval + adjustment for skin effect + adjustment in depth".

"Depth-depth differences"

In the SST products there is an adjustment provided that can be added to the fundamental satellite SST retrieval (of skin SST at the time of the satellite) to give an estimate of SST at typical drifting buoy depth (~20 cm) at standard local times of day (10.30 h or 22.30 h). To explore this, the satellite SST-20cm estimate for a standard local time will be differenced with the spatially-matched in situ SST history interpolated in time to the same local time. There should ideally be no systematic effects with respect to latitude, wind, satellite-buoy-time-difference, etc, because the adjustment is meant to account correctly (on average) for all the geophysical effects (other than those from comparing a point to a pixel, which are assumed to be zero-mean noise). This measure therefore tests the combination of "retrieval + adjustment for skin effect + adjustment in depth + adjustment in time". Compared to the skin-skin difference, this tests in addition the time-adjustment of the SST for the diurnal cycle.

"Daily mean depth-depth differences"

Although not generated in existing datasets, there has been a requirement which we are considering to estimate an adjustment to be added to the skin SST from the satellite which would give an estimate of the daily-mean SST at the location of the satellite observation. The day over which the mean is to be estimated is the UTC (i.e., GMT, not local) day which includes the time of the satellite observation. The in situ data would therefore consist of the average of the history of the in situ measurements over a 24 hour period. This comparison therefore tests "retrieval + adjustment for skin effect + adjustment in depth + adjustment to daily mean", and the spread of the results will include the uncertainty effect of estimation of the daily mean SST from a single observation. Systematic effects in the differences should be small.


Friday, 18 July 2014

"System Maturity" CORE-CLIMAX style

The SST CCI project is to create both new SST data and to prototype a system for how this SST climate data record can continue to be routinely provided in the future. The "system" is a processing chain that takes inputs (satellite radiance data, auxiliary data etc) and transforms these into SST products. It consists of something like 100000 lines of code, installed at the facility for Climate and Environmental Modelling from Space (CEMS) in Harwell.

CORE-CLIMAX [no kidding] is a European project. Within its scope is development of a means of encapsulating how "mature" systems for delivering climate data records are. It is quite instructive for a team like ours to evaluate itself against the various criteria in the "System Maturity Matrix" they propose. We just did a self evaluation, for AVHRR and analysis products, and find that in its current state, the project straddles a "research capability" and an "initial operations capability" in most areas. That seems right -- it is exactly where we would expect to be at this stage, working on science and also towards a functioning, sustainable system.

Here are our self assessment results. The green shaded boxes show the range of "maturity" of different aspects of the project within each of the metric categories (software readiness, metadata, user documentation, uncertainty characterisation, feedback/access and usage).