As a lab, we’ve made it a priority to increase the standards of our code to align with best practices for reproducibility and repeatability of our science. In keeping with this goal, this week in reading group Saras Windecker and Hannah Fraser lead a discussion on the British Ecological Society’s Guide to Reproducible Code in Ecology and Evolution, measures on how we can implement these guidelines in our research and the barriers that limit their uptake within the QAEco group. Here is a brief summary of that discussion.
The BES guide details workflow and project structure, version control, and techniques for defensive coding. While many of us already implement all or some of these suggestions into our workflow, it is by no means universal. Moreover, the QAEco group has a steady intake of new students, for whom these skills are, for the most part, completely new.
Whether a beginner or a veteran at maintaining ‘good’ workflows, we found over the course of the discussion that there are two major barriers to the uptake of reproducibility recommendations: lack of availability of resources/time, and fear/sense of intimidation to taking the plunge in to the great unknown of workflows, version control and GitHub!
For many new students in particular the mountain of new skills they are expected to climb gets steeper with each successive year. Ecology increasingly demands top quality experimental design, fieldwork and data analysis skills. Add to that mix, a healthy dose of programming skills, and given the lack of structured training in the form of university courses or workshops on reproducibility in scientific practice (experiment and/or analysis), striving to be a researcher in ecology can quickly become overwhelming.
For established researchers, on the other hand, time is the major barrier for the uptake an implementation of reproducible workflow as it can be difficult for academics to justify, or budget, the time spent learning the relevant skills. However, failing to invest in reproducible workflows accumulates debt that must be paid either in time spent debugging/repeating work, or in low-quality science. This has been called as ‘technical debt’ and often the associated ‘scientific debt’, respectively. We must commit, as a group, that time spent teaching, learning, and implementing these practices is not just a bonus, or luxury for those so inclined, but a necessary component of our science.
New students and established researchers alike, we realised that the major barrier was a sense of intimidation to learning some of these methods. And with reason… learning git is scary! The command line interface and unintuitive functions pose a significant limitation to the uptake of these skills in research. In addition, many of the debugging and defensive coding recommendations require transitioning from simply writing scripts in R to writing functions.
At QAECO we’re determined to shed a new leaf, and promote the idea of a programming evolution… starting slow, and incrementally adding reproducible workflow components in a manner that both keeps the burden of time to a minimum, and ensures mistakes are not made in the process. Our next steps to this end are to run a series of sessions to introduce technical tutorials on the main aspects of the BES guide over the next four coding club meetings. These tutorials will cover:
- how to set up a project
- version control
- reproducible reports
- debugging and defensive coding
Some useful resources for further reading:
- An R package to help set up a reproducible project – termed compendium (the README also contains many other useful links): https://github.com/ropensci/rrrpkg
- A soft git intro: https://allendowney.github.io/amgit/
— Saras (& Hannah & August)