Reproducibilty and the ease of fooling ourselves
The new study from the Reproducibility Project, published this week in Science, has been getting a great deal of attention. In short, out of 100 attempted well-powered replications of findings from three psychology journals, less than half were found to replicate. Ed Yong has a particularly nice piece at the Atlantic that discusses the results and many difficulties in determining exactly what they mean. I’d like to highlight the first paragraph of the conclusion section for those who haven’t read the paper, as I think it is one of the clearest and most honest discussion sections I have ever seen in a published paper:
After this intensive effort to reproduce a sample of published psychological findings, how many of the effects have we established are true? Zero. And how many of the effects have we established are false? Zero. Is this a limitation of the project design? No. It is the reality of doing science, even if it is not appreciated in daily practice. Humans desire certainty, and science infrequently provides it. As much as we might wish it to be otherwise, a single study almost never provides definitive resolution for or against an effect and its explanation. The original studies examined here offered tentative evidence; the replications we conducted offered additional, confirmatory evidence. In some cases, the replications increase confidence in the reliability of the original results; in other cases, the replications suggest that more investigation is needed to establish the validity of the original findings. Scientific progress is a cumulative process of uncertainty reduction that can only succeed if science itself remains the greatest skeptic of its explanatory claims.
This last point may be the most important. It’s very easy for scientists to become enamored of their own findings and theories, rather than being our own biggest critics. There is a great quote from Richard Feynman from his talk on “cargo cult science” (essential reading) that expresses this well:
The first principle is that you must not fool yourself—and you are the easiest person to fool. So you have to be very careful about that. After you’ve not fooled yourself, it’s easy not to fool other scientists. You just have to be honest in a conventional way after that.
I generally trust that my scientific colleagues are “honest in a conventional way”, but many widely accepted practices make it very easy for us all to fool ourselves. As one example, take the practice of increasing the sample size of a study after testing and finding no significant effects. As Simmons et al. showed using simulations in their “False Positive Psychology” paper, this practice can seriously inflate the false positive rate. However, in a survey of more than 2,000 academic psychologists by John, Lowenstein and Prelec, more than half of the researchers admitted to having done this. Perhaps even more problematic, when asked to rate the acceptability of this practice on a scale from 0 (unacceptable) to 2 (acceptable), the researchers who had done this gave it an average acceptability rating of 1.79. Similar results were found for optional stopping (i.e. stopping the study prior to obtaining the planned sample size, because the desired effect was found), as well as for many other problematic practices.
It is a true testament to the field of psychology that it has taken the reproducibility bull by the horns and seriously examined how well it is doing as a field. In particular, Brian Nosek and the Center for Open Science deserve great credit for leading the way in this endeavor. We hope that other fields will engage in similar self-examination; it is likely that any field that uses the standard null-hypothesis statistical testing approach will also have high rates of non-replication, for all the reasons that John Ioannidis pointed out a decade ago. If we want to do better, we need to take seriously Feynman’s admonition and do everything in our power to stop fooling ourselves.