Skip to content

There has been some discussion about a paper in Nature Climate Change by Gleckler et al that says they detect “a positive identification (at the 1% level) of an anthropogenic fingerprint in the observed upper-ocean temperature changes, thereby substantially strengthening existing detection and attribution evidence.” What they’ve done is collect datasets on volume-averaged temperatures for the upper 700 metres of the ocean.

But Yeager and Large, writing in the Journal of Climate, looking at the same layer of ocean, come to a different view. They conclude that it is natural variability, rather than long-term climate change that dominates the sea surface temperature and heat flux changes over the 23 years period (1984 – 2006). They say the increase in sea surface temperatures is not driven by radiative forcing. It’s a good example of how two groups of scientists can look at the same data and come to differing conclusions. Guess which paper the media picked up?

Whilst the IPCC AR4 report says that between 1961 – 2003 the upper 700 metres has increased in temperature by 0.1 deg C, some researchers think that that estimate is an artifact of too much interpolation of sparse data (Harrison and Carson 2007). Their analysis found no significant temperature trends over the past 50 years at the 90% level, although this is a minority opinion.

The interesting thing about Gleckler et al is that their unambiguous detection of a human fingerprint in ocean warming comes from what they say are “results from a large multimodel archive of externally forced and unforced simulations.” To you and me this means with and without anthropogenic carbon dioxide. What they have done is to look at the average of a variety of computer models.

What Does A Multimodel Mean?

But what is meant by a multimodel mean, and how is one to know when the ensemble of models used to calculate the mean is large enough to provide meaningful results? Another pertinent question is if averaging multiple models is a safe thing to do in the first place?

Tweak this or that parameter, change a numerical calculation and a different output from a computer model will be obtained. In some quarters these are described as experiments, which is technically true given the definition of the word experiment. But in my view they are not on a par with physical experiments. Experiments in the real world are questions asked of nature with a direct reply. Experiments in computer models are internal questions about a man-made world, not the natural one. That is not to say there is not useful insight here. One just has to be careful not to get carried away.

For some there is insight in diversity. For example the CMIP3 is an ensemble of twenty major climate models and while many of them are related in terms of code and philosophy, many are not. Advocates of the multi-model approach say this is a good thing as if models produced in different ways agree because it provides confidence that we have in some way understood what is going on.

But the key point, philosophically and statistically, is that the various outputs of computer models are not independent samples in the same way that repeated measurements of a physical parameter could be. They are not independent measurements centred on what is the “truth” or reality.

Given this, does the addition of more models and “experiments” force the mean of a multimodel ensemble to converge on reality? Some, such as the work by Professor Reto Knutti believe it doesn’t. I agree, and think it is a precarious step to take to decide that reality and models are drawn from the same population. How can uncertainty in parameterisation of climatic variables and numeric calculations reproduce uncertainty in the climate system? The spread of models is not necessarily related to uncertainty in climate predictions.

When one averages climate models one has to be clear about what the averaging process actually does. It does not legitimize the spread of climate model output, compensating for each models errors and biases, as if an average of widely different predictions is somehow the ‘correct’ representation of reality. Averaging computer models does not necessarily make things clearer. Indeed it results in a loss of signal and throws away what most models are saying.

There are researchers who point out that the average of an ensemble of models actually reproduces real-world climate data better than any individual model. To my mind this is not a great achievement in their favour, but something about which we should be suspicious. It smacks of selection effects, bias and begging the question.

When climate models are used to make predictions some scientists refer to the past as a “training period” meaning that if the model reproduces the past it will reproduce the future. Perhaps it will, but it is not certain that it will and we cannot prove it, especially when the “training period” is shorter than long-term semi-cyclic climatic influences.

My overall impression is that computer climate models, useful as they can be, have been oversold and that they have been often used without interpreting their results in terms of known processes and linked to observations – the recent standstill in the annual average global temperatures is an example.

Modeling the climate is not like modeling a pendulum in which all relevant information is available to forecast its future movement until chaos theory takes over. General climate models are an approximation of the complex physical, chemical and biological processes that happen on Earth. We have incomplete knowledge of what goes on, we have limited computational abilities and sparse real-world observations of many parameters. All these are reasons to be wary of individual models, let alone an average of an ensemble of them.