In the Open Access unit we discussed making publications freely accessible. In this module, we will explore why and how to use open research workflows, which are the steps undertaken to conduct the research.
Defining a research workflow
A research workflow is the series of steps and decisions made to do the research, and documenting a research workflow allows the process to be repeated or understood by others. What is included in a research workflow can have rigid standards for reproducibility in some disciplines (e.g., STEM fields) and may be less reliant on reproducibility standards in other disciplines (e.g., humanities and social sciences). It provides details on how you got from an idea to a conclusion or an output, and the tools that you used to get there.
An open research workflow is when each of step of the workflow is openly shared – making all stages of the research project transparent and reproducible. Clear documentation of your workflow includes using best practices around file naming conventions, project metadata, file formats, etc. Openness in workflow does not mean all your data is “exposed” and available for anyone to take, but is about making sure that all documentation can be accessed regardless of tools and formats used in the process of the research.
Workflow stages and tools
A research workflow has many stages between ideation and output, and many tools might be used throughout this process. The 101 Innovations in Scholarly Communication Project surveyed tools used by researchers for different aspects of their work and has sorted workflows into six higher-level stages: discovery, analysis, writing, publication, outreach, and assessment. The image below showcases the wide variety of workflow tools available for each stage of the process.
Sometimes the decision on what tools to use is determined by best practices in your discipline. But beyond disciplinary requirements, you can go a long way by documenting your process using an open workflow. Small decisions, such as consistent file-naming conventions or making open backups of data stored in a proprietary tool (i.e., saving your files as .xlsx format which is a Microsoft Excel Open XML Spreadsheet, and can therefore be opened in other applications, unlike .xls files), make it possible to continue work years or even decades down the line. These are small but intentional decisions that make all the difference to future scholars.
![](https://101innovations.files.wordpress.com/2015/06/innoscholcomm-logo-cc-by-1024x1024.png)
Why Does it Matter?
Imagine reading a paper but never being able to open the related data or see the working manuscript because the original materials no longer exist. What about if they do exist but you’re not sure exactly which version the author worked with, what tools they used, or whether something else impacted their conclusions because their workflow is unclear or undocumented? Worse yet, what if all of the work happened in proprietary software which, while necessary and relevant to use at the time, is no longer available?
Information available in the public record about how research will be, is, or was conducted can help with: validation and reproduction of research results, understanding how a scholar developed an idea, and increasing public engagement. It also helps prevent loss of work due to a lack of ownership, digital decay, software deprecation, and missing files.
Non-open workflows that make use of proprietary tools and data formats can limit communication of our work and make it more difficult for a future researcher to understand why and how we did things in the way that we did them. While a proprietary tool might be the best tool for a job (for instance, a specialty scanner or a writing tool that your collaborators are familiar with), by considering what happens when that tool becomes obsolete can help you make decisions to buffer against future losses. Avoiding a worst case scenario in which work simply disappears can be as simple as a bit of planning around exporting backups into open formats or as complex as shifting your workflow to new, more sustainable, tools.
In the Open Software and Open Data modules in this unit, you will learn best practices for open workflows to help ensure that core research artifacts are recorded, maintained and shared in meaningful ways. Next, we will consider the importance of reproducibility and replicability in research.
![](https://ctlt-openprogram-2020.sites.olt.ubc.ca/files/2020/06/Person_2.png)
Test your knowledge
![](https://ctlt-openprogram-2020.sites.olt.ubc.ca/files/2020/07/Dig-Deeper-2.png)
Dig Deeper
To learn more about open workflows, read the following:
- A blog article from 2015 on the 101 Innovations in Scholarly Communications project: https://blogs.lse.ac.uk/impactofsocialsciences/2015/11/11/101-innovations-in-scholarly-communication/
- Fenlon, K. (2019). Interactivity, Distributed Workflows, and Thick Provenance: A Review of Challenges Confronting Digital Humanities Research Objects. In 2019 15th International Conference on eScience (pp. 510-513). IEEE. 10.5281/zenodo.345977
- Leek, J. T. (2016). How to be a modern scientist. https://leanpub.com/modernscientist
Some material in this module was adapted from the Open Science Training Handbook, CC-0.