Best Practices for Open Workflows | Program for Open Scholarship and Education

Working in a reproducible and replicable manner requires examining the research workflow and finding ways to improve the organization and shareability of the materials, data, instruments, and decision-making process. Storing all materials related to the research study in one place, where they are organized in a meaningful way, helps make sure all materials are accounted for when building workflow documentation.

When working on research projects, some files that you might work with include:

Raw data files
Processed data files
Code and scripts for manipulation and analyses of the data
Outputs like figures and tables
A record of the decisions about the specific tools and methods used
Writing associated with your project

Documentation is a necessary part of research which allows others (or even the same research team) to reproduce or replicate the results of a study. Useful documentation includes:

Version Control documentation –Version control is a system of recording changes made to a file or set of files over time that can later be reviewed. Version control allows researchers to track when changes were made and document the reasoning behind those decisions. The next module, Open Software, offers more guidance on version control.
Metadata – Metadata describes the content and structure of a project, data, and files related to it. README files provide important information about a data file and project, and can be as simple as a text file (Example: README 4) or involve greater formatting and design elements (Example: Sulu). On the other hand, codebooks are intended as a guide to the variables in a data file. We’ll talk more about metadata in the “Open Data” module.

Best Practices for File Names

When creating new computer files, use naming conventions to make it easier for anyone to find files and understand what each file does or contains. Using a good naming convention when structuring a project directory also supports reproducibility by helping others who are not familiar with your project quickly understand your directory and file structure. The most important part of a naming convention is that it’s consistent.

Human readable: use expressive names that clearly and meaningfully describe what the directory or file contains (e.g., code, data, outputs, figures).
- Example Naming Elements and Structure: DescriptiveTitle_DocumentType_Date_Version
- Sample Named File: SkyLabExperiment_ContractNegotiations_20170104_Rev0.pdf
Machine-readable: avoid special characters (e.g., symbols or accents) and spaces. Instead of spaces, you can use - or _ to separate words within the name to make them easy to read and parse using programming codes or other forms of scripting.
Sortable: Sorting allows you to quickly see what is there and find what you need. For example, you can create a naming convention for a list of related directories or files (e.g., 01-max.jpg, 02-terry.jpg, etc), which will result in sortable files.
Clear dates: Use the ISO Date Standard of YearMonthDay (e.g. 20241106)
Consistent sentence case: Switching between lower and Upper case can cause coding issues when switching between operating systems (Mac vs Linux vs Windows). To avoid case issues, using lower case naming or CamelCase. CamelCase is the practice of writing phrases without spaces or punctuation, indicating the separation of words with a single capitalized letter (e.g., instead of survey data, you would write SurveyData).

For practical examples of naming conventions, watch the following video:

Records Management 101: Document naming conventions by the University of British Columbia’s Records Management Office

File Formats

In keeping with the FAIR principles, files should be in an open format, unencrypted and uncompressed. An open format is non-proprietary so that the file can be opened using open software that is not owned by a specific company. For example, instead of saving to an Excel file (.xls) which requires a specific license to open, save as a CSV or XML. Unencrypted data is accessible to anyone, whereas encrypted data is secure and locked and would require a pass code or key to unlock the data. An uncompressed file is one that is stored in the original format and hasn’t been compressed into another format. The UK Data Archive provides a list of recommended data formats to make your data sharable, reusable and preserved.

Choosing formats that are operating system and tool-agnostic such as .csv and .txt is one way to avoid the issues related to tool licenses and to increase the potential for sharing fully usable and open content. The Open Software module provides more information on open source tools.

Scenario – Video Interview File Naming Conventions

Professor Sam Meyers is performing over 100 interviews with first-generation undergraduate students for research on information retrieval. Professor Meyers is developing a file naming convention to ensure the data is easily retrievable on their shared drive for future use. Their study is currently titled “IR Study.” The interviewees have been labelled 1 to 100 for anonymity. The first video recording occurred on November 5, 2020, and was performed by graduate research assistant Karina Cassidy.

Professor Meyers and the research team decided to include the following information in the file naming conventions:

Project Name
Date of Data Collection
Content of the File
Researcher’s Initials
Interviewee Label

Dig Deeper

Learn more about organizing information via the best practice guides from UBC Library Research Data Management.
Review examples and guidelines file naming conventions.
Stodden, V. C. (2011). Trust your science? Open your data and code. https://academiccommons.columbia.edu/doi/10.7916/D8KD27BK/download

Adapted from Lesson 3. How To Organize Your Project: Best Practices for Open Reproducible Science by Jenny Palomino, Leah Wasser, Max Joseph (Earth Lab) licensed under a CC BY-NC-ND 4.0 LICENSE.

Back

Continue