Why Karl et al 2015 Doesn’t Eliminate The ‘Hiatus’
Dr David Whitehouse, Global Warming Policy Forum
Even accepting the statistical approach taken by Karl et al it is clear that their errors are larger than they realise, and that the trends they obtain depend upon cherry-picked start and end points that include abnormal conditions, i.e. the 1998-2000 El Nino/La Nina and the 2014 northeast Pacific Ocean “hot spot.”
When estimating trends, especially for such short periods in a noisy data set such as global surface temperatures, care must be taken with start and end points as they can affect the trend obtained.
Fig 1 shows the difference between the new NOAA data and the currently used NOAA data.
The differences between the two datasets are small. Prior to 2008 the new data was cooler than the existing set, after 2008 it was warmer. The variations are much smaller than the errors which NOAA says are are +/- 0.09°C.
Comparing the new and current NOAA annual data to the NASA Giss and HadCRUT4 global surface datasets is done in Fig 2. An offset of +0.1°C has been added to HadCRUT4 to make it more easily comparable to the others (in this analysis we are interested in gradients not absolute values). HadCRUT4 errors are +/- 0.1°C and NASA Giss is +/- 0.05°C, as quoted by them.
Does the inclusion of the new NOAA data makes a difference to the “hiatus” reported in the other three datasets?
I follow the approach adopted by Karl et al in considering only data between 1998 – 2014 in this particular analysis. To quantify the range of trends that would be expected to be due to chance with the new NOAA data I considered a time series of 17-years with the statistical properties of NOAA data. I performed a Monte Carlo analysis involving 10,000 simulations of random data. My result indicates that the trends reported by Karl et al 2015 – which were only ever marginally significant at the 10% level – are much less significant. Comparing their trends – 0.086°C per decade for 1998-2012, 0.106°C per decade for 1998-2014 and 0.116°C per decade for 2000-2014 – with the outcome of the Monte Carlo simulation revealed positive trends between 0.08-0.12°C per decade 1,133 times out of the 10,000 simulations. We conclude that, irrespective of their quoted small errors in their trends, none of them are robust or provide evidence that the “hiatus” does not exist.
Even if the errors of the trends quoted by Karl et al 2015 are accepted their conclusion that they remove the “hiatus” is incorrect for another reason. The effect on their trends of their start and end points explains the differences they obtain. Their highest trend was between 2000-2014 when the start point was a cool La Nina year and the endpoint influenced by the recent anomalously warm temperature of the northeast Pacific. Terminating the data two years earlier in 2012 reduces the influence of the northeast Pacific and consequently reduces the trend. Commencing their analysis in 1998 (a warm El Nino year) produces a trend that is smaller than 2000-2014 because of the warmer starting point, as expected. Similarly the 1998-2012 trend is significantly smaller than the 1998-2014 trend, again due to the influence of recent warm seas on the northeast Pacific.
This table gives my analysis of the new NOAA temperature data set compiled by Karl et al with my error estimates.
I conclude that the elimination of the hiatus claimed by Karl et al 2015 is unsafe because of bias due to the choice of start and end points that are extremes of natural fluctuations in the global surface temperature record, as well as a overemphasis on statistically poor results.
What is meant by a “significant” trend?
How should we interpret evidence on whether there has been a hiatus in global surface warming?
Professor Gordon Hughes, University of Edinburgh
Studies in medicine, social sciences and other disciplines tend to be full of claims that some observation is “statistically significant” with associated statements about certainty expressed as probabilities or p-values. These claims are based on testing procedures derived from classical statistics when applied to experimental data that has been collected and analysed in a particular way. Unfortunately, all too often the key assumptions bear little resemblance to the analysis that has actually been carried out.
The classical framework of testing may be illustrated by considering an experiment to determine how, say, wheat responds to the application of a herbicide designed to kill weeds. Multiple small plots in a variety of locations are assigned randomly to different treatments including no application of the herbicide and applications varying from, say, 0.2 to 5 times a standard dose. At the end of the experiment the weight of wheat grains collected from each plot are recorded. Some pre-defined statistical tests are carried out to determine whether there is a linear or S-shaped relationship between the amount of herbicide applied and the plot yield. Due to the experimental design, factors which affect wheat yield – rainfall, temperatures, soil fertility, insect pests – are assumed to vary randomly across plots, while the measurement of the outcome is specified in advance and cannot be altered.
Using the data that has been collected we estimate a parameter β that defines the shape or slope of the response of wheat yield to the amount of herbicide applied where the value zero means no response and values greater than zero means that wheat yield increases with herbicide application, though not necessarily in a linear fashion. Taking account of the variability in wheat yields across plots our statistical analysis concludes that the central estimate of β is 0.5 with a 90% confidence range of 0.3 to 0.7. The idea is that if the same experiment were to be repeated independently 100 times then we would expect to obtain a value of β that lies outside this range in only 10 experiments.
Few experiments correspond to this idealised description but even if they do, the claim about confidence intervals that is made may be quite wrong. The problem is that from one experiment we do not know what the “true” level of variability in herbicide response across the full range of locations where wheat might be grown in the UK or Europe. We assume that the variability – as measured by the standard error of the parameter β – is an unbiased estimate of the “true” variability, but without actually doing 100 or more experiments we cannot be sure of that. Indeed there are good reasons why the assumption may be wrong. The experiment may have been carried out in a season with low average rainfall or late frosts – i.e. we may have failed to randomise over all variables that affect the outcome. So, any conclusion about the statistical significance of a parameter depends critically on whether the study has genuinely identified all of the sources of variability that might affect the observations.
The study by Karl et al (Science Express, 4 June 2015) appears in a completely different light when scrutinised in this way. It claims that for the 17 years from 1998 to 2014 their new data produces a trend increase in global temperature of 0.106°C per decade with a 90% confidence range of 0.048 to 0.164°C per decade. Cross-checks show that the confidence range is calculated solely by using the variability in the period from 1998-2014. But this does not accurately reflect the variability in their data for the full period from 1880 to 2014. To demonstrate the point, the trend increase in global temperature can be computed for every 17-year period between 1880 and 2014 using the method followed by Karl et al. This gives us the actual variability over all 17-year periods in the data, not an estimate based on a single period. It turns out that the actual variability is more than 3 times the Karl et al estimate. This analysis also shows that the distribution of 17-year trends is negatively skewed (the mean is much lower than the median), so that the empirical confidence range goes further into negative values for the trend than conventional calculations would suggest.
It is possible that there has been some change in the underlying variability of temperature has changed since the middle of the 20th century. Karl et al report trends for periods from 1950 and 1951, so the same exercise was repeated for all 17-year periods from 1950. The variability of these trends is, indeed, lower than for the full period but it is still 2.4 times the estimate of variability based on the single period 1998-2014. In fact, based on an analysis of 17-year periods since 1950 one cannot rule out the possibility of no trend in temperatures since the mean trend is 0.126°C per decade with a 90% confidence range of -0.017 to +0.217°C per decade.
Figure 1 – Estimates and confidence intervals of trend increases in temperature
The results of using the historical variability of the temperature data rather than variability estimated for relatively short periods is shown in Figure 1. The estimated values are shown as hatched bars while the 90% confidence intervals are given by the vertical lines. This demonstrates that the claim that the trend increase from 1998 to 2014 was “significant” rests on an erroneous estimate of the actual variability in estimates of the trend. Indeed, even the large trend increase from 1951 to 2012 has a much wider confidence range when based on the variability of all 62 year periods since 1880.
It is important to be clear about the limitations of this kind of analysis. The global temperature has increased since 1880. It is probable but far from certain that the trend rate of increase accelerated after 1950. However, given the variability in the trends estimated for relatively short periods, the hypothesis that there was a hiatus after 1998 cannot be rejected using the Karl et al data. In fact, based on the full data series one would expect that a trend increase of at least 0.1 °C per decade would be observed in about 15% of all 17-year periods examined, even if the underlying trend in global temperatures is zero.
The lesson is that no study should rely upon trends over selected short periods of time to make claims about a series with as much variability over time as global temperatures. That is as true for the relatively large increase from 1976 to 1998 as for the more recent period. Even that trend has been exceeded in 10% of all 23-year periods since 1880.
Even if the study had not drastically underestimated the amount of variability in 17-year trends in the historical data, there is another problem that is not addressed. This is: what is or was the starting point of the trend? In the spirit of the classic warning to all statisticians – Darrell Huff’s book titled ‘How to Lie with Statistics’ – it is possible to use a particular set of data to generate a wide range of trends simply by choosing a suitable starting point.
Think of an interested observer at the end of 1998. She is told by Karl et al that the trend increase in temperatures from 1998 to 2014 will be 0.106 +/- 0.058 °C per decade. So, in Figure 2, she draws the solid black line as her forecast of the average global temperatures and draws the two dashed blue lines to define the range of outcomes that would fall within the 90% confidence range for the trends. In 2015 she comes back and plots the actual temperatures over the period, the solid red line. She notes that the actual temperatures were, in almost all years, below the bottom end of the trend range stated by Karl et al. When it is pointed out that, on historical evidence, the Karl confidence range should have been much larger, she sees that the actual temperatures mostly fell within the adjusted trend range as shown by the orange dashed lines in the figure.
Figure 2 – Trends starting from the actual temperature in 1998
What is going on? The key point is that any linear trend is defined by the combination of a slope and a starting point (or midpoint). Karl et al focus exclusively on the slope. If she is only given the slope of the trend our observer has no choice other than to apply the slope to the most recent observed value – the one for 1998. Hence, any discussion of whether trends for different periods are significant will be misleading unless we have applied a consistent basis for selecting the starting point.
If, as a separate exercise, we estimate a trend that is constrained to start from the actual temperature in 1998, the slope is 0.016 +/- 0.068 °C per decade using the Karl et al method – clearly immaterial and not statistically significant. Again, the starting point is crucial. It is a necessary feature of the Karl et al approach that they shift the starting point in a way that relies upon hindsight. It turns out that the residual for 1998 – i.e. the difference between the actual and the predicted values for 1998 – is the largest of all of the residuals for the 17-year period. There is no surprise about this: we know now that 1998 was an extreme outlier in the global temperature series. But was this highlighted in 1998?
It may be argued that these points are all well-known and would be understood by scientists reading the paper. I am not convinced by this claim. The final sentences of the paper’s abstract say “… the central estimate for the rate of warming during the first 15 years of the 21st century is at least as great as the last half of the 20th century. These results do not support the notion of a `slowdown’ in the increase in global surface temperatures.” That claim was not tested, neither can it easily be tested and certainly not with any degree of statistical power. It rests on a fundamental misunderstanding about results can properly be inferred from an analysis of time trends over short periods in a time series that displays large variability. Since its revisions to other global temperature series have been challenged, the paper does not provide well-founded statistical evidence to draw any reliable conclusions about the rate at which global temperatures have been increasing, whether over the last 15 years or 65 years.
There is a further, more technical, defect in the analysis that affects the reported confidence intervals (or error bars) and, thus, the interpretation of the results. As explained in the Supplementary Material the standard errors are adjusted to allow for first order serial correlation in the errors. In simple terms this means that the random error in this year’s global temperature is correlated with last year’s error, so high temperatures in one year are likely to be followed by high temperatures in the following year.
There is no doubt that some adjustment to the standard errors should be made to take account of the time profile of errors, but have Karl et al made the right adjustment? The answer is no and the reason is important in relation to how we understand and use time series of temperatures.
Statisticians have devoted a lot of effort over the last 30 years to studying the properties of time series that follow random walks – as in a random walk down Wall Street – or something very similar. An example of a simple random walk is a process in which there are three possible outcomes – no change, an increase of one unit, or a decrease of one unit – with equal probabilities. Identifying whether a time series follows some kind of random walk has large implications for how it should analysed and/or used for forecasting future outcomes. The statistical procedures adopted by Karl et al are incorrect if the time series of global temperatures incorporates a random walk.
Statistical tests of whether a time series is a random walk are not very powerful, so it is not possible to state definitively that different global temperature series are random walks. Even so, applying the standard tests to the annual data published by Karl et al the hypothesis that global temperatures follow a random walk cannot be rejected. The series can be described as a random walk with drift – i.e. in common parlance a trend because the average change from one year to the next is greater than zero – and with lagged responses to shocks. This is a technical way of describing the clear visual pattern that global temperatures tend to follow a cyclical pattern around an upward trend.
Recognising that global temperatures may follow a random walk substantially alters the strength of inferences about the underlying trend (the drift term in the random walk). The core of the empirical argument about climate change is that the rate of increase in temperatures accelerated after about 1950. The claim is that there was what statisticians call a structural break in 1950 with a higher trend rate of increase after 1950 than the one up to 1950. It is easy to test this using all of Karl et al’s data from 1880 to 2014 while maintaining the assumption that there was a change in the trend but not in the other characteristics of the random walk. The results including the 90% confidence intervals are:
Trend increase up to 1950 -0.001 (+/- 0.094) °C per decade
Trend increase after 1950 +0.125 (+/- 0.144) °C per decade
The central estimate of the post-1950 trend increase is very similar to Karl et al’s estimate but note how much wider the confidence intervals are. According to conventional statistical criteria we cannot conclude that the post-1950 trend is significantly greater than zero. In addition, there is no evidence at all of a significant trend in temperatures prior to 1950.
This sheds an entirely different light on the Karl et al results. Their claims about trends in global temperatures are the product of faulty statistical analysis. We must be clear about what this means. There is no dispute that temperatures have increased since 1950. Subject to any qualifications about the ways in which global temperatures are measured, that is a matter of fact. On the other hand, there is no reasonable statistical certainty that the increase signifies an underlying trend which will lead to an increase in global temperatures of 2°C or more by the middle or end of the 21st century unless drastic policies are adopted. Climate models may support such a conclusion but the statistical evidence on global temperatures does not.
The Karl et al estimates of the trend increase in temperatures and associated confidence intervals for various periods can be reproduced by using Prais-Winsten estimation with asymptotic standard errors. Since the errors are unlikely to be homoscedastic, most econometricians would use a robust or semi-robust of the standard errors. For the period 1998-2014 the semi-robust standard error is about 50% higher than the asymptotic estimate.
The estimates of the full period and post-1950 confidence intervals were constructed using a rolling window method to estimate trends for all N-year continuous periods and then using the 5th and 95th percentiles of the distribution of estimated trends.
The presence of a unit root in the annual series of global temperatures was tested using the standard ADF test with drift and between 1 & 6 lags based on the partial autocorrelation functions of the original and the first-differenced series. The optimal lag is of order 3, but the ADF test does not reject the hypothesis of a unit root at even the 10% level of significance for any of 1 to 6 lags. In addition, the KPSS test rejects the hypothesis of level stationarity at the 1% level of significance for any of 0 to 4 lags.
The random walk specification can be examined using an ARIMA(3,1,0) process – i.e. AR(3) for the first-differenced series – with a semi-robust estimator for standard errors. This lag structure yields the highest value of the log-likelihoods for the alternative specifications estimated using the full data series. The estimation of the model with a structural break was carried out using year and d*(year-1950) as independent variables, where d takes the value 0 for years up to 1950 and the value 1 for year from 1951 onwards.