In the Open Access unit we discussed making publications freely accessible. In this module, we will explore why and how to use open research workflows, which are the steps/processes undertaken to conduct the research.
Having more information available in the public record about how research will be, is, or was conducted can help other researchers validate and reproduce published research results, understand how a scholar developed an idea, increase evaluation from the research community, and allow for increased public engagement. It also helps prevent loss of scholarship due to a lack of ownership, digital decay, software deprecation, and missing files.
Small decisions, such as consistent file-naming conventions or making open backups of data stored in a proprietary tool (i.e., saving your files as .xlsx format which is a Microsoft Excel Open XML Spreadsheet, and can therefore be opened in other applications, unlike .xls files), make it possible to continue work a few years or even decades down the line. These are small but intentional decisions that make all the difference to future scholars. This module will recommend steps to organize your work in a way that is reproducible.
Defining a research workflow
A research workflow is the series of steps undertaken to do the research, which allows the process to be repeated or understood by others. Research workflow may have rigid standards for reproducibility in some disciplines (e.g., STEM fields) and may be less reliant on reproducibility standards in other disciplines (e.g., humanities and social sciences). In either case, the process taken to reach an argument, and sources that were consulted, should be made transparent.
Workflows and “Open”
An open research workflow is when each of step of the research process is openly shared through documentation that makes all stages of the research project transparent and reproducible. Clear documentation includes using best practices around file naming conventions, project metadata, file formats, etc. Openness in workflows does not mean all your data is “exposed” and available for anyone to take, but is about making sure that you document the information about how you got from an idea to a conclusion or an output and the tools that you used to get there to enable meaningful long-term preservation of research outputs.
Workflow stages and tools
A research workflow has many stages from ideation to output, and many tools might be used throughout this process. The 101 Innovations in Scholarly Communication Project surveyed tools used by researchers for different aspects of their work and has sorted academic workflows into six higher-level stages: discovery, analysis, writing, publication, outreach, and assessment. Their circle of tools showcases the wide variety of workflow tools available for each stage of this process.
Sometimes the decision on what tools to use is determined in advance for you by best practices in your discipline. Ultimately, whether a tool makes sense for you is context specific. But beyond disciplinary requirements, you can go a long way to shaping an open workflow by ensuring that all aspects of your project are carefully documented, that the tools you use are accessible to your team and your community, and that your files are predominantly stored in open file formats (or can be exported to open file formats) so that your work won’t be lost if a tool or format stops being available in future. Using open tools that are more accessible to your research community means easier potential for cross-collaborations and a smoother transition to future open publications.
Why Does it Matter?
Traditional non-open research workflows are constantly impacted by the introduction of new tools, approaches, and software. What stays the same, across disciplines, is a need to communicate ideas and conclusions so that they may be built on by future scholars. Sharing the complete context of your work helps future scholars understand why a conclusion was reached and contribute their own ideas more meaningfully.
Imagine reading a paper but never being able to open the related data or see the working manuscript because the original materials that the author worked with no longer exist. What about if they do exist but you’re not sure exactly which version the author worked with, what tools they used, whether something else impacted their conclusions because their workflow is unclear. Worse yet, what if all of the work happened in proprietary software which, while necessary and relevant to use at the time, is no longer available.
Increasingly research communities are embracing the importance of open, reproducible, and replicable work across disciplines. Closed workflows that make use of proprietary tools and data formats may limit the number of people we can communicate our work to and make it more difficult for a future researcher to understand why we did things in the way that we did them. While a proprietary tool might be the best tool for a job (for instance, a specialty scanner or a writing tool that your collaborators are familiar with), by considering what happens when that ceases to exist can help you make decisions on what tool to use to buffer against future losses. Avoiding a worst case scenario in which work simply disappears can be as simple as a bit of planning around exporting backups into open formats or as complex as shifting your workflow to new, more sustainable, tools.
In the Open Software and Open Data modules in this unit, you will learn best practices for the use of open software to help ensure that core research artifacts are recorded, maintained and shared in meaningful ways. Next, we will consider the importance of reproducibility and replicability in research.
Test your knowledge
Dig Deeper
To learn more about open workflows, read the following:
- A blog article from 2015 on the 101 Innovations in Scholarly Communications project: https://blogs.lse.ac.uk/impactofsocialsciences/2015/11/11/101-innovations-in-scholarly-communication/
- Arguments for open research workflows from “Why Open Research?“
- Explore the Open Science Training Handbook, CC-0
- Armeni, K., Brinkman, L., et al. (2021). Towards wide-scale adoption of open science practices: The role of open science communities. 10.31222/osf.io/7gct9
- Fenlon, K. (2019). Interactivity, Distributed Workflows, and Thick Provenance: A Review of Challenges Confronting Digital Humanities Research Objects. In 2019 15th International Conference on eScience (pp. 510-513). IEEE. 10.5281/zenodo.345977
- Leek, J. T. (2016). How to be a modern scientist. https://leanpub.com/modernscientist
Some material in this module was adapted from the Open Science Training Handbook, CC-0.