The replication crisis
Scientific progress depends on results that can be repeated – yet across disciplines, too many experiments fail that test. From medicine to economics, the replication crisis reveals not just flawed data, but a deeper problem: the power of authority over evidence
The replication crisis in science refers to the experimental finding that many or most experimental findings don’t hold up when scientists try to repeat them. Unlike those other scientific results, it seems that this one has legs – the issue started to become very visible in areas such as medicine, psychology and biology in the mid-2000s, but other fields including economics are not immune.
Since verification is a key step in the scientific method, this calls the whole scientific project into question. Blame for it is often put down to the ‘publish or perish’ ethos, where academics are under pressure to come up with novel findings that will make interesting papers, but another factor is a (rather unscientific) respect for authority. As Jay Bhattacharya, Director of the US National Institutes of Health, told the New York Times, “You have, in field after field after field, a kind of set of dogmatic ideas held by the people who are at the top of the field. And if you don’t share those ideas, you have no chance of advancing within those fields.”
In other words, the problem is not just that experiments do not replicate. It is that theories endorsed by leaders in the field replicate without end.
Respect my authority
An early example of the phenomenon occurred in 1923 when an eminent scientist called Theophilus Painter published a paper that announced that, according to his microscopic observations, human cells contained 24 pairs of human chromosomes. Other scientists repeated his observations and came up with the same number.
However, in the 1950s new methods were developed in which cells were placed onto microscope slides, giving a better view, and it soon became obvious that there were in fact only 23 pairs. Still, Painter’s influence was such that many scientists preferred to stay with his count. Indeed, textbooks from the time showed photographs of chromosomes, in which there were clearly 23, and yet the caption said there were 24. A variation on this occurs when new results are simply ignored because they don’t agree with current theories.
New results are simply ignored because they don’t agree with current theories
A cornerstone of modern economics is the random walk hypothesis, which states that price changes in the stock market are due to random fluctuations. In their 1999 book A Non-Random Walk Down Wall Street, Andrew Lo and Craig MacKinlay recounted that “when we first presented our rejection of the random walk hypothesis at an academic conference in 1986, our discussant – a distinguished economist and senior member of the profession – asserted with great confidence that we had made a programming error, for if our results were correct, this would imply tremendous profit opportunities in the stock market. Being too timid (and too junior) at the time, we responded weakly that our programming was quite solid thank you, and the ensuing debate quickly degenerated thereafter. Fortunately, others were able to replicate our findings exactly.” The random walk hypothesis was thus falsified and never spoken of again (not).
Don’t blame the butterfly
I had first-hand experience of something similar while doing my doctorate on model error in weather forecasting. The general view at the time (around 2000) was that forecast error was primarily due to chaos, aka the ‘butterfly effect,’ rather than the model itself. It followed that by making multiple model runs starting from slightly altered initial conditions, it should still be possible to make probabilistic forecasts: a technique known as ensemble forecasting. My thesis though showed there was a simple test: if forecast errors grew exponentially in time (line curves up), they were probably due to chaos, but if they grew with the square-root of time (line curves down), then they were due to the model. During a talk at a main European weather centre, when I showed a plot of forecast errors growing almost perfectly with the square-root of time, I was interrupted by the institution’s research head who said confidently that the plot must be wrong, since error growth has positive curvature, not negative.
After the talk, we agreed that someone should replicate my results. When that was done, they were identical to the ones I had shown – however, it made absolutely no difference. The consensus remained that the errors were primarily due to chaos, so the expensive ensemble forecasting systems were not at risk (though not everyone was convinced, including New Scientist magazine which ran with the cover article ‘Don’t blame the butterfly’).
Replicate this
Of course, you might think that replication should be less of a problem in finance, if only because of the amounts of money that are often at stake. You can’t just make up a wacky theory about the stock market with made-up data and hope that no one will notice. Or say that a line curves up when it obviously curves down. However, in another sense it may be that the opposite is true.
Biology has progressed remarkably since the textbooks of the 1950s. Not only can biologists correctly count chromosomes, they can also engineer what goes on inside them. Economics and finance in contrast seem stuck (the random walk hypothesis dates back to 1900, and people are still arguing about it). Instead of replicating tired ideas, maybe it is time to look at data in a new way. But that is a topic for another column.


