Two new papers that discuss uncertainty in surface temperature measurements.
The issue of uncertainty in surface temperature measurements is getting some much needed attention, particularly in context of the HadCRUT datasets. For context, some previous Climate Etc. posts on this topic:
- On adjustments to the HadSST3 data set
- Critique of the HadSST3 uncertainty analysis
- Unknown and uncertain sea surface temperatures
The first paper, by John Kennedy of UK Met Office, provides a comprehensive and much needed uncertainty analysis of sea surface temperature measurements and analyses:
A review of uncertainty in in situ measurements and data sets of sea-surface temperature
Abstract. Archives of in situ sea-surface temperature (SST) measurements extend back more than 160 years. Quality of the measurements is variable and the area of the oceans they sample is limited, especially early in the record and during the two World Wars. Measurements of SST and the gridded data sets that are based on them are used in many applications so understanding and estimating the uncertainties are vital. The aim of this review is to give an overview of the various components that contribute to the overall uncertainty of SST measurements made in situ and of the data sets that are derived from them. In doing so, it also aims to identify current gaps in understanding. Uncertainties arise at the level of individual measurements with both systematic and random effects and, although these have been extensively studied, refinement of the error models continues. Recent improvements have been made in the understanding of the pervasive systematic errors that affect the assessment of long-term trends and variability. However, the adjustments applied to minimize these systematic errors are uncertain and these uncertainties are higher before the 1970s and particularly large in the period surrounding the Second World War owing to a lack of reliable metadata. The uncertainties associated with the choice of statistical methods used to create globally complete SST data sets have been explored using different analysis techniques but they do not incorporate the latest understanding of measurement errors and they want for a fair benchmark against which their skill can be objectively assessed. These problems can be addressed by the creation of new end-to-end SST analyses and by the recovery and digitization of data and metadata from ship log books and other contemporary literature.
In using SST observations and the analyses that are based on them, it is important to understand the uncertainties inherent in them and the assumptions and statistical methods that have gone into their creation. In this review I aim to give an overview of the various components that contribute to the overall uncertainty of SST measurements made in situ and of the data sets that are derived from them. In doing so, I also aim to identify current gaps in understanding.
Section 2 provides a classification of uncertainties. The classifications are not definitive, nor are they completely distinct. They do, however, reflect the way in which uncertainties have been approached in the literature and provide a useful framework for thinking about the uncertainties in SST data sets. The uncertainties have been tackled in ascending order of abstraction from the random errors associated with individual observations to the generic problem of unknown unknowns.
Throughout this review the distinction will be made between an error and an uncertainty. The error in a measurement is the difference between some idealized “true value” and the measured value and is unknowable. The uncertainty of a measurement [is defined] as the “parameter, associated with the result of a measurement, that characterizes the dispersion of the values that could reasonably be attributed to the measurand”. This is the sense in which uncertainty is generally meant in the following discussion. This is not necessarily the same usage as is found in the cited papers. It is common to see the word error used as a synonym for uncertainty such as in the commonly used phrases standard error and analysis error.
Broadly speaking, errors in individual SST observations have been split into two groupings: random observational errors and systematic observational errors. Although this is a convenient way to deal with the uncertainties, errors in SST measurements will generally share a little of the characteristics of each.
Random observational errors occur for many reasons: misreading of the thermometer, rounding errors, the difficulty of reading the thermometer to a precision higher than the smallest marked gradation, incorrectly recorded values, errors in transcription from written to digital sources and sensor noise among others. Although they might confound a single measurement, the independence of the individual errors means they tend to cancel out when large numbers are averaged together. Therefore, the contribution of random independent errors to the uncertainty on the global average SST is much smaller than the contribution of random error to the uncertainty on a single observation even in the most sparsely observed years. Nonetheless, where observations are few, random observational errors can be an important component of the total uncertainty.
Systematic observational errors are much more problematic because their effects become relatively more pronounced as greater numbers of observations are aggregated. Systematic errors might occur because a particular thermometer is mis-calibrated, or poorly sited. No amount of averaging of observations from a thermometer that is mis-calibrated such that it reads 1 K too high will reduce the error in the aggregate below this level save by chance. However, in many cases the systematic error will depend on the particular environment of the thermometer and will therefore be independent from ship to ship. In this case, averaging together observations from many different ships or buoys will tend to reduce the contribution of systematic observational errors to the uncertainty of the average.
In Kennedy et al., [2011b] two forms of this uncertainty were considered: grid-box sampling uncertainty and large-scale sampling uncertainty (which they referred to as coverage uncertainty). Grid-box sampling uncertainty refers to the uncertainty accruing from the estimation of an area-average SST anomaly within a grid box from a finite, and often small, number of observations. Large-scale sampling uncertainty refers to the uncertainty arising from estimating an area-average for a larger area that encompasses many grid boxes that do not contain observations. Although these two uncertainties are closely related, it is often easier to estimate the grid-box sampling uncertainty, where one is dealing with variability within a grid box, than the large-scale sampling uncertainty, where one must take into consideration the rich spectrum of variability at a global scale.
In the context of SST uncertainty, unknown unknowns are those things that have been overlooked. By their nature, unknown unknowns are unquantifiable; they represent the deeper uncertainties that beset all scientific endeavors. By deep, I do not mean to imply that they are necessarily large. In this review I hope to show that the scope for revolutions in our understanding is limited. Nevertheless, refinement through the continual evolution of our understanding can only come if we accept that our understanding is incomplete. Unknown unknowns will only come to light with continued, diligent and sometimes imaginative investigation of the data and metadata.
JC comment: Uncertain T. Monster is VERY pleased by this comprehensive discussion of the uncertainties. The greatest challenges (discussed at length in the paper) are how to assess structural uncertainties in the analysis methods and how to combine all the uncertainties. Any application of these data (including trend analysis) needs to consider these issues.
The second paper attempts to slay the uncertainty monster.
Coverage bias in the HadCRUT4 temperature series and its impact on recent temperature trends
Kevin Cowtan and Robert Wray
Abstract. Incomplete global coverage is a potential source of bias in global temperature reconstructions if the unsampled regions are not uniformly distributed over the planet’s surface. The widely used HadCRUT4 dataset covers on average about 84% of the globe over recent decades, with the unsampled regions being concentrated at the poles and over Africa. Three existing reconstructions with near-global coverage are examined, each suggesting that HadCRUT4 is subject to bias due to its treatment of unobserved regions. Two alternative approaches for reconstructing global temperatures are explored, one based on an optimal interpolation algorithm and the other a hybrid method incorporating additional information from the satellite temperature record. The methods are validated on the basis of their skill at reconstructing omitted sets of observations. Both methods provide superior results than excluding the unsampled regions, with the hybrid method showing particular skill around the regions where no observations are available. Temperature trends are compared for the hybrid global temperature reconstruction and the raw HadCRUT4 data. The widely quoted trend since 1997 in the hybrid global reconstruction is two and a half times greater than the corresponding trend in the coverage-biased HadCRUT4 data. Coverage bias causes a cool bias in recent temperatures relative to the late 1990s which increases from around 1998 to the present. Trends starting in 1997 or 1998 are particularly biased with respect to the global trend. The issue is exacerbated by the strong El Ni˜no event of 1997-1998, which also tends to suppress trends starting during those years.
Published by the Royal Meteorological Society, link to abstract.
The Guardian has an extensive article, excerpts:
There are large gaps in its coverage, mainly in the Arctic, Antarctica, and Africa, where temperature monitoring stations are relatively scarce.
NASA’s GISTEMP surface temperature record tries to address the coverage gap by extrapolating temperatures in unmeasured regions based on the nearest measurements. However, the NASA data fails to include corrections for a change in the way sea surface temperatures are measured – a challenging problem that has so far only been addressed by the Met Office.
In their paper, Cowtan & Way apply a kriging approach to fill in the gaps between surface measurements, but they do so for both land and oceans. In a second approach, they also take advantage of the near-global coverage of satellite observations, combining the University of Alabama at Huntsville (UAH) satellite temperature measurements with the available surface data to fill in the gaps with a ‘hybrid’ temperature data set. They found that the kriging method works best to estimate temperatures over the oceans, while the hybrid method works best over land and most importantly sea ice, which accounts for much of the unobserved region.
Cowtan & Way investigate the claim of a global surface warming ‘pause’ over the past 16 years by examining the trends from 1997 through 2012. While HadCRUT4 only estimates the surface warming trend at 0.046°C per decade during that time, and NASA puts it at 0.080°C per decade, the new kriging and hybrid data sets estimate the trend during this time at 0.11 and 0.12°C per decade, respectively.
These results indicate that the slowed warming of average global surface temperature is not as significant as previously believed. Surface warming has slowed somewhat, in large part due to more overall global warming being transferred to the oceans over the past decade. However, these sorts of temporary surface warming slowdowns (and speed-ups) occur on a regular basis due to short-term natural influences.
The results of this study also have bearing on some recent research. For example, correcting for the recent cool bias indicates that global surface temperatures are not as far from the average of climate model projections as we previously thought, and certainly fall within the range of individual climate model temperature simulations. Recent studies that concluded the global climate is a bit less sensitive to the increased greenhouse effect than previously believed may also have somewhat underestimated the actual climate sensitivity.
This is of course just one study, as Dr. Cowtan is quick to note.
“No difficult scientific problem is ever solved in a single paper. I don’t expect our paper to be the last word on this, but I hope we have advanced the discussion.”
To give a flavor of twitter discussion:
John Kennedy: The irony is that the study being used to bash HadCRUT4 assumes that HadCRUT4 is correct where we have data.
The paper is getting plenty of media attention, I’m also getting queries from reporters.
Let’s take a look at the 3 methods they use to fill in missing data, primarily in Africa, Arctic, and Antarctic.
- 1. Kriging
- 2. UAH satellite analyses of surface air temperature
- 3. NCAR NCEP reanalysis
The state that most of the difference in their reconstructed global average comes from the Arctic, so I focus on the Arctic (which is where I have special expertise in any event).
First, Kriging. Kriging across land/ocean/sea ice boundaries makes no physical sense. While the paper cites Rigor et al. (2000) that shows ‘some’ correlation in winter between land and sea ice temps at up to 1000 km, I would expect no correlation in other seasons.
Second, UAH satellite analyses. Not useful at high latitudes in the presence of temperature inversions and not useful over sea ice (which has a very complex spatially varying microwave emission signature). Hopefully John Christy will chime in on this.
Third, re reanalyses in the Arctic. See Fig 1 from this paper, which gives you a sense of the magnitude of grid point errors for one point over an annual cycle. Some potential utility here, but reanalyses are not useful for trends owing to temporal inhomogeneities in the datasets that are assimilated.
So I don’t think Cowtan and Wray’s analysis adds anything to our understanding of the global surface temperature field and the ‘pause.’
The bottom line remains Ed Hawkins’ figure that compares climate model simulations for regions where the surface observations exist. This is the appropriate way to compare climate models to surface observations, and the outstanding issue is that the climate models and observations disagree.