Skip to content

Whatever the outcome of the BEST analysis, they have brought welcome and fresh ideas to a topic which, despite its importance, has had virtually no intellectual investment in the past 25 years.

Rich Muller sent me the BEST papers about 10 days ago so that I would have an opportunity to look at them prior to their public release. Unfortunately, I’ve been very busy on other matters in the past week and wasn’t able to get to it right away and still haven’t had an opportunity to digest the methods paper. (Nor will I for a week or two.)

As a disclaimer, Rich Muller is one of the few people in this field who I regard as a friend. In 2004, he wrote an article for MIT Review that drew attention to our work in a favorable way. I got in touch with him at the time and he was very encouraging to me. I started attending AGU at his suggestion and we’ve kept in touch from time to time ever since. While people can legitimately describe Phil Jones as not being a “nuclear physicist”, the same comment cannot be made of Rich Muller in either sense of the turn of phrase.

The Value of Independent Analysis
The purpose of audits in business is not to overturn the accounts prepared by management, but to provide reassurance to the public. 99% of all audits support management accounts. I’ve never contested the idea that it is warmer now than in the 19th century. If nothing else, the recession of glaciers provides plenty of evidence of warming in the last century.

Nonetheless, it is easy to dislike the craftsmanship of the major indices (GISS, CRU and NOAA) and the underlying GHCN and USHCN datasets. GISS, for example, purports to adjust for UHI through a “two legged adjustment” that seems entirely ad hoc and which yields counterintuitive adjustments in most areas of the world other than the US. GISS methodology also unfortunately rewrites its entire history whenever it is updated. CRU notoriously failed to retain its original separate data sets, merging different stations (ostensibly due to lack of “storage” space, though file cabinets have long provided a low-technology method of data storage. GHCN seems to have stopped collecting many stations in the early 1990s for no good reason (the “great dying of thermometers”) though the dead thermometers can be readily located on the internet.

Even small changes in station history can introduce discontinuities. Over the years, USHCN has introduced a series of adjustments for metadata changes (changes in observation times, instrumentation), all of which have had the effect of increasing trends. Even in the US where metadata is good, the record is still plagued by undocumented discontinuities. As a result, USHCN recently introduced a new method that supposedly adjusts for these discontinuities. But this new method has not been subjected to thorough scrutiny by external statisticians.

The US has attempted to maintain a network of “rural” sites, but, as Anthony Watts and his volunteers have documented, these stations all too often do not adhere to formal standards of station quality.

The degree to which increased UHI has contributed to observed trends has been a longstanding dispute. UHI is an effect that can be observed by a high school student. As originally formulated by Oke, UHI was postulated to be more or less a function of log(population) and to affect villages and towns as well as large cities. Given the location of a large proportion of stations in urban/town settings, Hansen, for example, has taken the position that an adjustment for UHI was necessary while Jones has argued that it isn’t.

Unlike the statistical agencies that maintain other important indices (e.g. Consumer Price Index), the leaders of the temperature units (Hansen, Jones, Karl) have taken strong personal positions on anthropogenic global warming. These strong advocacy and even activist positions are a conflict of interest that has done much to deter acceptance of these indices by critics.

This has been exacerbated by CRU’s refusal to disclose station data to critics, while readily providing the same information to fellow travellers, a refusal. Nonetheless, as I reminded CA readers during CRU’s refusal of even FOI requests, just because they were acting like jerks, didn’t mean that the indices themselves were in major error. Donna Laframboise’s “spoiled child” metaphor is apt.

The entry of the BEST team into this milieu is therefore welcome on any number of counts. An independent re-examination of the temperature record is welcome and long overdue, particularly when they have ensured that their team included not only qualified statistical competence, but eminent (Brillinger).

They introduced a new method to achieve homogeneity. I have not examined this method or this paper and have no comment on it.

A commenter at Judy Curry’s rather sarcastically observed that, with my experience in mineral exploration, I would undoubtedly endorse their use of kriging, a technique used in mineral exploration to interpolate ore grades between drill holes.

His surmise is correct.

Indeed, the analogies between interpolating ore grades between drill holes and spatial interpolation of temperatures/ temperature trends has been quite evident to me since I first started looking at climate data.

Kriging is a technique that exists in conventional statistics. While I haven’t had an opportunity to examine the details of the BEST implementation, in principle, it seems far more logical to interpolate through kriging rather than through principal components or RegEM (TTLS).

Dark Areas of the Map
In the 19th century, availability of station data is much reduced. CRU methodology, for example, does not take station data outside the gridcell and thus leaves large portions of the globe dark throughout the 19th century.

BEST takes a different approach. They use available data to estimate temperatures in dark grid cells while substantially increasing the error bars of the estimates. These estimates have been roundly condemned by some commenters on threads at Judy Curry’s and Anthony Watts’.

After thinking about it a little, I think that BEST’s approach on this is more logical and that this is an important and worthwhile contribution to the field. The “dark” parts of the globe did have temperatures in the 19th century and ignoring them may impart a bias. While I haven’t examined the details of their kriging, my first instinct is in favor of the approach.

The Early Nineteenth Century
A second major innovation by BEST has been to commence their temperature estimates at the start of the 19th century, rather than CRU’s 1850/1854 or GISS’s 1880. They recognize the increased sparsity of station data with widely expanded error bars. Again, the freshness of their perspective is helpful here.(They also run noticeably cooler than CRU between 1850 and 1880.) Here is their present estimate:

The differences between BEST and CRU have an important potential knock-on impact in the world of proxy reconstructions – an area of technical interest for me. “Justification” of proxy reconstructions in Mannian style relies heavily on RE statistics in the 19th century based on CRU data. My guess is that the reconstructions have been consciously or subconsciously adapted to CRU and that RE statistics calculated with BEST will deteriorate and perhaps a lot. For now, that’s just a dig-here.

It’s also intriguing that BEST’s early 19th century is as cold as it is.

BEST’s estimate of the size of the temperature increase since the start of the 19th century is much larger than previous estimates. (Note- I’ll update this with an example.)

The decade of the 1810s is shown in their estimates as being nearly 2 degrees colder than the present. Yes, this was a short interval and yes, the error bars are large. The first half of the 19th century is about 1.5 degrees colder than at present.

At first blush, these are very dramatic changes in perspective and, if sustained, may result in some major reinterpretations. Whereas Jones, Bradley and others attempted to argue the non-existence of the Little Ice Age, BEST results point to the Little Ice Age being colder and perhaps substantially colder than “previously thought”.

It’s also interesting to interpret these results from the context of “dangerous climate change”, defined by the UN as 2 deg C. Under BEST’s calculations, we’ve already experienced nearly 2 deg C of climate change since the early 19th century. While the authors of WG2 tell us that this experience has been entirely adverse, if not near catastrophic, it seems to me that we have, as a species, not only managed to cope with these apparently very large changes, but arguably even flourished in the last century. This is not to say that we would do equally well if faced with another 2 deg C. Only that if BEST estimates are correct, the prior 2 degrees do not appear to have been “dangerous” climate change.

Comparison to SST
They do not compare their land results to SST results. These two data sets have been said to be “independent” and mutually reinforcing, but I, for one, have had concerns that the results are not truly independent and that, for example, the SST bucket adjustments have, to some extent, been tailored, either consciously or subconsciously, so that the SST data cohere with the land data.

Here is a plot showing HadSST overlaid onto the Berkeley graphic. In the very early portion, the shape of the Berkeley series coheres a little better to HadSST than CRUTem. Since about 1980, there has been a marked divergence between HadSST and the land indices. This is even more marked with the Berkely series than with CRUTem.

Station Quality
I have looked at some details of the Station Quality paper using a spreadsheet of station classification sent to me by Anthony in August 2011 and cannot replicate their results at all. BEST reported a trend of 0.039 deg C/decade from “bad” stations (CRN 4/5) and 0.0509 deg C/decade from “good” stations” (CRN1/2) [arrgh – this is fixed to reflect their units of deg C/century]. Using my archive of USHCN raw data (saved prior to their recent adjustments), I got much lower higher trends [arrggh- corected since they reported in deg C/century not,as I assumed deg C/decade], with trends at good stations being lower than at bad stations in a coarse average. The station counts for good and bad stations don’t match to the information provided to me. [Perhaps they’ve applied their algorithm to USHCN stations. Dunno.]

Watts Count

Rohde Count

Rohde Trend


Trend (1979)











As was observed in very early commentary on surface stations results, there is a strong gradient in US temperature trends (more negative in the southeast US and more positive in the west). The location of good and bad stations is not spatially random, so some care has to be taken in stratification.

In my own quick exercises on the topic, I’ve experimented with a random effects model, allowing a grid cell effect. I’ve also experimented with further stratification for rural-urban (using the coarse USHCN classification) and for instrumentation.

On this basis, for post-1979 trends, “rural bad” had a trend 0.08 deg C/decade greater than “rural good”; “small town bad” was 0.07 deg C decade greater than “small-town good” and “urban bad” was the opposite sign to “urban good” : 0.01 deg C/decade cooler.

Stratifying by measurement type, “CRS bad” was 0.05 deg C/dec warmer than “CRS good” while “MMTS bad” was 0.15 deg C warmer than “MMTS good”.

Combining both stratifications, “MMTS rural good” had a post-1979 trend of 0.11 deg C/decade while “CRS urban bad” had a corresponding trend of 0.42 deg C/decade.

Details of the BEST calculation on these points are not yet available, though they’ve made a commendable effort to be transparent and I’m sure that this lacuna will be remedied. I’ve placed my script for the above results online here. (The script is not turnkey as it relies on a spreadsheet on station quality that has not been released yet, but the script shows the structure of the analysis.)

As some readers have noticed, I was interviewed by Nature and New Scientist for their reports on BEST. In each case, perhaps unsurprisingly, the reporters chose to emphasize criticisms. For example, my nuanced criticism of the analysis of the effect of station quality was broadened by one reporter into a sweeping claim about overall replicability that I didn’t make.

Whatever the outcome of the BEST analysis, they have brought welcome and fresh ideas to a topic which, despite its importance, has had virtually no intellectual investment in the past 25 years. I am particularly interested in their 19th century conclusions.