Reproducibility and replicability are dependent on increased documentation and research outputs organized in a manner that are easily identified, collated, and packaged for openly sharing.
When working on research projects, some files that you might work with include:
- Raw data files
- Processed data files: you may need to take the raw data and process it in some way
- Code and scripts
- Outputs like figures and tables
- Writing associated with your project
Next, we suggest some best practices for organizing your files to maximize the usefulness of open workflows.
Best Practices for File Names
When creating new computer files, use naming conventions to make it easier for anyone to find files and understand what each file does or contains. Using a good naming convention when structuring a project directory also supports reproducibility by helping others who are not familiar with your project quickly understand your directory and file structure. The most important part of a naming convention is that it’s consistent.
- Human readable: use expressive names that clearly and meaningfully describe what the directory or file contains (e.g., code, data, outputs, figures).
- Example Naming Elements and Structure: DescriptiveTitle_DocumentType_Date_Version
- Sample Named File: SkyLabExperiment_ContractNegotiations_20170104_Rev0.pdf
- Machine-readable: avoid special characters (e.g., symbols or accents) and spaces. Instead of spaces, you can use
-
or_
to separate words within the name to make them easy to read and parse using scientific programming or other forms of scripting. - Sortable: Sorting allows you to quickly see what is there and find what you need. For example, you can create a naming convention for a list of related directories or files (e.g.,
01-max.jpg
,02-terry.jpg
, etc), which will result in sortable files. - Clear dates: Use the ISO Date Standard of YearMonthDay (e.g. 20201106)
- Consistent sentence case: Switching between
lower
andUpper
case can cause coding issues if switching between operating systems (Mac vs Linux vs Windows). To avoid case issues, using lower case naming or CamelCase. CamelCase is the practice of writing phrases without spaces or punctuation, indicating the separation of words with a single capitalized letter (e.g., instead of survey data, you would write SurveyData).
For practical examples of naming conventions, watch the following video:
Dig Deeper
- Additional information about file naming and how it relates to open data can be found in the Open Data module.
- Learn more about organizing information via the best practice guides from UBC Library Research Data Management.
- Review examples and guidelines file naming conventions.
Create a README File
A readme file at the top level of your project is a standard convention. The readme is a file that describes data/software packages and tools used in your project. The readme should also describe files, associated naming conventions and other details important to understanding the files. Readme files can be as simple as a text file (Example: README 4) or they can involve greater formatting and design elements (Example: Sulu).
Proprietary File Formats
Proprietary formats are formats that require a specific tool or license to open, for example Excel (.xls) or Word (.doc). These formats may change over time as new versions come out (example: .xls
upgraded to .xlsx
). When choosing file formats for a project, it’s important to consider ongoing access to the license of the tool and whether others have access as well.
Choosing formats that are operating system and tool-agnostic such as .csv
and .txt
is one way to avoid the issues related to tool licenses and to increase the potential for sharing fully usable content.
The Open Software module provides insight into open source tools.
Scenario – Video Interview File Naming Conventions
Professor Sam Meyers is performing over 100 interviews with first-generation undergraduate students for research on information retrieval. Professor Meyers is developing a file naming convention to ensure the data is easily retrievable on their shared drive for future use. Their study is currently titled “IR Study.” The interviewees have been labelled 1 to 100 for anonymity. The first video recording occurred on November 5, 2020, and was performed by graduate research assistant Karina Cassidy.
Professor Meyers and the research team decided to include the following information in the file naming conventions:
- Project Name
- Date of Data Collection
- Content of the File
- Researcher’s Initials
- Interviewee Label
Adapted from Lesson 3. How To Organize Your Project: Best Practices for Open Reproducible Science by Jenny Palomino, Leah Wasser, Max Joseph (Earth Lab) licensed under a CC BY-NC-ND 4.0 LICENSE.