Back in 2001, a group of researchers investigated whether paroxetine, an antidepressant drug of the selective serotonin reuptake inhibitor class, could be an effective treatment for adolescents with major depression. The authors concluded that, indeed, paroxetine was able to reduce depressive symptoms in adolescents and that the drug was well tolerated. This finding brought a fair amount of excitement in the field of adolescent psychiatry. The study was published at the Journal of the American Academy of Child and Adolescent Psychiatry (JAACAP) and was highly influential to other researchers. In 2013, however, an independent international group of researchers re-analyzed the data of the original study and concluded that paroxetine was not able to improve the condition of adolescents with symptoms of depression. Even more, in the re-analysis, serious adverse events were associated with this treatment. This is an example of a scientific claim that was failed to be reproduced and confirmed by other researchers. Unfortunately this happens more often than not and as discussed in a previous article in SciFact, it is one of the main problems that can jeopardize scientific advancement.
While reproducibility is inherently connected to scientific method, there is a growing apprehension of findings that cannot be reproduced. Are you wondering what is the reason behind this? In this post, I will discuss some of the research practices that lead to irreproducible results and ways to combat this phenomenon.
To start from the bottom, science is “the systematic enterprise that brings together knowledge about the world in the form of testable laws and principles”. Findings from individual studies need to be reviewed, appraised, and ultimately verified successfully from independent investigators, to ultimately accord the status of scientific theory. In other words, in order to solidify scientific claims, other researchers need to be able to repeat the experiment, observe the same phenomenon and verify the interpretation of previous experimenters. Poor research practice at any point of the scientific inquiry process could subvert the quality of the reported conclusions and hinder reproducibility. Scientific inquiry can be roughly separated into the conception and design of the study, the execution of the experiments and the analysis and dissemination of the results (Figure 1).

Conception and design of the study
Any scientific inquiry starts with a research question that aims to generate new knowledge and address uncertainties in an area of concern. When it comes to medical research, research questions are aimed at understanding, diagnosing, preventing and curing pathological conditions that threaten the well-being of the population. Developing a strong research question that is specific enough to keep the work focused, feasible so that it can be answered within practical restrictions and relevant to society or to academic debate, is a necessity.
After the conception of the research question, follows the design of the required experiments, known as study design. One of the main threats to reproducible science during the design of the experiments is low statistical power. The statistical power of a study is low when the sample size- meaning the number of experimental subjects (i.e. the participants in a clinical trial)- is not sufficiently large to detect an effect, given that the effect is there. In order to avoid the conduct of under-powered studies, researchers usually estimate the required sample size in advance and design the study accordingly. The sample size can be estimated based on existing literature on the topic under investigation, if available. This is the so called power analysis. The recruitment of a large sample size though is not always feasible due to high costs. In the case of clinical trials for example, the recruitment of a large number of participants comes with a considerable increase in the budget needed. A way to combat low-powered studies due to limited resources is the collaboration among researchers at different study sites and the design of multi-center studies. Collaboration across medical centers does not only facilitate the conduction of high-powered studies but also facilitates generalizability of the findings and conclusions in a more diverse population.

Execution of the experiments
Another threat to reproducible science is poor methodological training of researchers on the experimental techniques, or lack of standardization of experimental methodologies among research groups. Even if some experimental procedures might seem simple and easy to perform, a high level of expertise is needed in order to produce trustworthy results. Experimenters for example working in animal behavior research need to follow the experimental protocol designed by the principal investigators, however, it is equally important to have the theoretical and practical background to monitor, understand and interpret subtle changes in animal behavior. Moreover, it is important to know how external parameters affect the behavior of the animal (i.e. scents, external sound, health state of the animal) and conceptualize how these can interfere with the quality of the results. In addition to assuring standardization of experimental methodologies within a research group, standardization of research methodologies among different groups is equally important, in order to render the results reproducible. In biomarker discovery research for example, inconsistencies between studies are partly due to lack of standardization in methods of blood collection, storage and specificity of analytical techniques. These procedures become more complicated when one considers that blood levels of biomarkers fluctuate widely depending on the time of the day, the menstrual cycle and the season. It is therefore necessary to minimize as much as possible the variability in blood collection, storage and analysis techniques, in order to identify robust, widely applicable biomarkers.

Analysis of the results
Following the collection of experimental data, a number of pitfalls in the analysis and interpretation of the results have been recognized as potent threats to reproducibility. Starting with the data analysis, rigorous statistical training of junior researchers is fundamental when aiming to produce robust research findings. Statistical methodology is subject of rapid improvement and revision. Therefore, senior researchers need also continuing training in order to produce high quality findings. Another point that needs attention, especially in multi-dimensional large datasets, is that there might be a number of reasonable ways to analyze the same data. It is therefore essential that when publishing the results of a study, the statistical analysis rational is presented in detail in order to be appraised and/or replicated. Moreover, the ever increasing automatization of statistical analysis comes with multiple risks, such as over-interpretation of noise or data dredging. Last but not least, researchers need to be aware of possible cognitive biases that might interfere with the objective analysis and interpretation of results. Example of cognitive bias is the proclivity to see patterns in random data and the tendency to focus on data that confirm the favored explanation of a phenomenon. Blinding is a technique that can be used to minimize the effects of cognitive biases during the data analysis process. In particular, in the process of cleaning and preparing the results (i.e. identification of outliers), the labels of the variables can be masked so that the analyst cannot make decisions based on the research hypothesis.

Dissemination of results
Last but not least, reproducibility of scientific evidence requires that the process of producing the findings and claims is transparently reported and accessible to other researchers. However, even though transparency and openness are vital features of reproducible science, they are not practiced as widely as expected. The academic publishing system plays an important role in this phenomenon. Journals have long been prioritizing clean narratives over detailed reports of experimental and analysis workflows. The focus has clearly been taken away from technical details on the methodology when the methods section takes an ‘add on’ status and is reported after the results have been presented and the conclusions have been discussed. Over the last years, due to the increasing concern about the lack of reproducibility, efforts have been made towards different reporting standards. As of 2015, a considerable number of journals have committed to adopt Transparency and Openness Promotion (TOP) Guidelines, a set of eight standards that moves journal’s publication procedures and policies towards greater transparency and openness.
Concluding remarks
In summary, lack of reproducibility is for the most part not due to inappropriate research practices or fabrication of experimental data, but due to a number of factors, including but not limited to, insufficient statistical power, poor research practices, lack of standardization among laboratories, substandard statistical training of researchers, or failure to account for cognitive biases and lack of transparency and detailed reporting of technical details in research papers. Towards more replicable data and robust scientific results, a joined effort is needed from researchers, journals and funding bodies. Firstly, researchers need to ensure that the experiments are being performed under the best possible practices. Secondly, journals need to put more emphasis on the technical aspects of the experimentation in reporting guidelines and reviewing processes. Finally, funding bodies should encourage applicants to think and report ways to assure that the proposed research will be easily reproduced. All in all, higher reproducibility is a necessity, as it will boost scientific advancement and reduce research waste.
