Skip to content

Increasing Replication Of Un-Reproducibility In Science

More often than not, we are unable to reproduce findings’ published by researchers in journals. Most results, including those that appear in top-flight peer-reviewed journals, can’t be reproduced

Best science picture of the year, courtesy of the Wall Street Journal (and thanks to reader Dan Hughes):

Replication in Science

In just the last decade—ten mere years—the number of “peer-reviewed” journal articles nearly doubled, from just over a million a year to about two million. That’s 5,500 papers per day. Even more telling is the right-hand graph: there were only eight-thousand journals in 1970, swelling to nearly 32,000 last year—with no end in sight.

I lost track of the number of journals relevant to my own field, statistics and probability. Well over 100. And if you consider that statistics is used in a majority of the 32,000 journals, there is no way to sample more than a fraction of what’s published. There are so many papers, the flood is so huge, that it gives me the contradictory feeling that I shouldn’t read anything. I figure there’s so much that I’ll miss that I might as well miss the lot. (I stick to books now—and there’s too many of them, too.)

Nor should most people fear missing a subscription since most of what makes it way into print isn’t worth reading. This follows from the historically well substantiated truism that most creative works are not lasting. You can nearly always find what’s valuable through the grapevine anyway (or by going to arxiv.org).

One consequence of the increasing number of papers is the number of mistakes, cheats, and non-replicable results are finally starting to hurt. My own field is responsible for more than its share of the calumny.

The WSJ quotes Glenn Begley, vice president of research at Amgen: “‘More often than not, we are unable to reproduce findings’ published by researchers in journals.” The paper opines, “This is one of medicine’s dirty secrets: Most results, including those that appear in top-flight peer-reviewed journals, can’t be reproduced.” It isn’t just medicine, of course. And the secret isn’t so secret.

Severals reasons for the degradation are adduced: in the lead, journals’ preference for positive results. (Also see this post: Are Academic Papers Growing Worse?)

[A]cademic researchers rarely conduct experiments in a “blinded” manner. This makes it easier to cherry-pick statistical findings that support a positive result. In the quest for jobs and funding, especially in an era of economic malaise, the growing army of scientists need more successful experiments to their name, not failed ones.

As I have long maintained, and by now I hope also proved, as long as you’re willing to put in the labor, “statistical significance” can be yours. Any set of data can be tickled into producing wee p-values, thus guaranteeing you a paper, i.e. a positive result showing what you had hoped to be true was true. Let the other guy worry about reproducing what you’ve done. And worry they’re doing.

Take, for example, Pfizer’s and Medivation’s stab at turning a “25-year-old Russian cold medicine into an effective drug for Alzheimer’s disease.” Medivation boss David Hung said of the published, peer-reviewed work on the drug: “Statistically, the studies were very robust.” Alas, after much effort and even more money, reproduction of the academic work could not be had. Which is by now a common story.

There are plenty of mistakes, too:

[T]he journal Science partially retracted a 2009 paper linking a virus to chronic fatigue syndrome because several labs couldn’t replicate the published results. The partial retraction came after two of the 13 study authors went back to the blood samples they analyzed from chronic-fatigue patients and found they were contaminated.

And don’t forget plain old cheating. Andrew Ferguson reports on the case of social psychologist, Diederik Stapel of Tilburg University in the Netherlands, a very naughty beguiler. He was a rising star in academics—150 papers and counting!—mostly because his work confirmed the biases of his colleagues. Advertising makes women feel bad about themselves. White men are homophobic. Messiness induces racism. And on and on. Stapel had plenty of publishable p-values, but only because he made up his data whole cloth (a way of cheating not yet covered on this blog).

Ferguson’s main point was the credulity of reporters, who repeat every “scientific” press release as gospel. We’ve seen it here when reviewing reports that seeing briefly a small picture of the American flag turns people into raving Republicans, or that attending a 4th of July parade while young turns one into a—can you guess?

Confirmation bias is one of those diseases the other guy gets.

Update All see this issue of ScienceBen Santer has a paper which argues that the twentieth century temperature increase (starting when? ending when? how much? the increase same every year?) has been reproduced, but that any reports of cooling have been greatly exaggerated.

Update See this: Questionable research practices are rife in psychology, survey suggests. Quote: “Questionable research practices, including testing increasing numbers of participants until a result is found, are the ‘steroids of scientific competition, artificially enhancing performance’.” You’ll remember we outlined this way of cheating a few weeks back. “[A]larming results suggest that one in ten psychologists has falsified research data.”

William M Briggs, 6 December 2011