Everyone wants to feel that their work has contributed something – that they added some order to a seemingly random world, or that their significance in the scheme of things is more than zero. In economics, and science in general, this measure of success is captured by the idea of statistical significance.
Suppose, for example, that a scientist has a theory that coffee causes a particular type of cancer because some data shows incidences of the disease rising with coffee consumption. Is the finding – and by implication, the scientist’s work – significant, or could the effect be down to chance alone?
Or suppose a pharmaceutical company is developing a new drug for heart disease. Human trials appear to show that the drug reduces mortality. But again, how can you tell whether the improvement is due to the drug, or the particular mix of patients used in the trial?
The trouble with significance
In 1925, Ronald Fisher suggested a standard way of answering this question: establish whether the data could reasonably have happened by chance, a test he called the ‘null hypothesis’. If not, the result is announced to be ‘significant’. The usual threshold for significance is that there is no more than a five percent chance of getting the results observed, or even more extreme results, if it is assumed they are due to random effects.
This sounds reasonable, and has the advantage of being easy to compute, which is perhaps why statistical significance has been adopted as the default test in most fields of science, including economics, to the point where it has become a kind of proxy for relevance or importance. Economists Deirdre McCloskey and Stephen Ziliak addressed this in their book, The Cult of Statistical Significance: “In economics departments, almost all of the teachers of probability, statistics and econometrics claim that statistical significance is the same thing as scientific significance.”
However, there is something a little confusing about this approach; it says that a theory is likely to be true if adopting the opposite of the theory – the null hypothesis – would mean the data is unlikely. (If you had to read that a few times to get it straight, that’s how I felt when I wrote it.) But we are not asking whether the data is unlikely (whatever that means) – we are asking whether a theory is true. And they aren’t the same thing.
For example, suppose you develop some strange symptom, go on the internet and discover that said symptom is associated with a rare but highly virulent disease. Does that mean you only have a week to live? No. For one thing, it may be associated with a number of less worrying things, and because that disease is rare it is probably rather low on the list of possible causes. In other words, probability of the data (i.e. the symptom) given the theory (i.e. that you have the disease) is not the same as probability of the theory given the data.
The approach therefore pushes us to seek out anomalous findings, instead of treating them as anomalous. For example, suppose we have lots of data and, after extensive testing of various theories, we discover one that passes the five percent significance test. Is it really 95 percent likely to be true? Not necessarily, because if we are trying out lots of ideas then it is likely that we will find one that matches purely by chance.
Now, there are ways of working around this within the framework of standard statistics. Unfortunately, the problem gets glossed over in the vast majority of textbooks and articles – especially, it seems, in the social sciences. And the effect is magnified by publication bias. The way to get work published is to find interesting, significant results – there is no mileage in saying that no pattern exists – so there is a tendency to try out multiple theories until one works for the particular data set. This is why, according to a number of studies, much scientific work proves impossible to replicate. The media jumps on exciting results, only to refute them when a new study is published. Given the amount of time and money spent on research, this represents what scientist Robert Matthews calls a “scandal of stunning proportions” in his book Chancing It: The Laws of Chance – And What They Mean For You.
Fortunately, there exists an alternative approach, known as ‘Bayesian statistics’, which has been around for some 200 years. Instead of starting with the assumption that data is random and then looking for deviations, Bayesians start with a model, but view it as inherently uncertain and flexible, and adjust it as new data either corroborates or undermines it. They interpret probabilities not as expected frequencies of observations, but as degrees of belief. The Bayesian approach is easy to understand because, instead of weird significance tests on null hypotheses, it just tries to estimate the probability that a model (i.e. the thing we want to know) is right, given the complete context. But it is harder to calculate for two reasons.
One is that, because it sees new data as updating our confidence in a theory, it asks that we supply some prior estimate of that confidence. This may be highly subjective, which is frowned upon in respectable scientific circles – although the problem goes away as more data comes in and the prior becomes more informed.
Another problem is that the approach does not treat the theory as given, which means that we may have to evaluate probabilities over whole families of theories, or at least a range of parameter values. However, this is less of an issue today since simulations can be performed automatically using fast computers and specialised software.
Perhaps the biggest impediment – and the reason the approach is still controversial after two centuries – is that when results are passed through the Bayesian filter, they often just don’t seem all that significant. But while that may be bad for publications and media stories, it is surely good for science.