Next, we’ll guide you through the steps to make your research data open: planning, describing, and preserving and sharing.
Before collecting your data, you need to create a project sharing plan. It is relatively easy to put a data set on a website or upload it into a project in OSF, but messy or poorly managed data does not help with its reuse. Each stage of the data life cycle should be considered when planning how you will manage your data and make it accessible. A Data Management Plan (DMP) helps you articulate how you will manage your data at each stage from its collection to analysis to preservation, encouraging you to work through a management plan before you start collecting data. This front load of work will save you endless headaches in the long run.
Like fulfilling open access and open data requirements, some research funders may request a DMP to be submitted with the funding application demonstrating how the data will be handled at each stage of the data life cycle.
- Best Practice: Plan – Learn more about the planning stage from DataONE.
- Data Management Plan Exemplars
- Tidy-ing Your Data: Simple Steps for Reproducible Research
As introduced in the Open Workflows module, metadata are a requirement of open data and include important details, such as:
- Who created the data
- The content of the data
- When the data was created
- Where the data is geographically
- Why and how the data was developed
There are three types of metadata:
- Descriptive: Provides information about the data that will help people find or understand the context. Project title, authors, keywords and collection methods are all types of descriptive metadata.
- Administrative: Provides technical, preservation and rights information. It answers questions about what software is required to use the data and what license is attached to the data. File type, file size, copyright status and license terms are all examples of administrative metadata
- Structural: Provides information about how the data files relate to one another. You can add additional metadata file links and publication links.
Metadata standards may vary depending on the repository or disciplinary stands, but will be consistent in common terminology, definitions, language and structure.
Test your knowledge
File Format and Structure
In keeping with the FAIR principles, data formats should be in an open format, unencrypted and uncompressed. An open format is non-proprietary so that the file can be opened using open software that is not owned by a specific company. For example, instead of saving to an Excel file save as a CSV or XML. Unencrypted data is accessible to anyone, whereas encrypted data is secure and locked and would require a pass code or key to unlock the data. An uncompressed file is one that is stored in the original format and hasn’t been compressed into another format. The UK Data Archive provides a list of recommended data formats to make your data sharable, reusable and preserved. Best practices for file naming can be found on the “Best Practices for Organizing Work” page.
A plain text readme file is often included alongside data files to assist others in understanding your data. The readme file is typically a first entry point to understanding a dataset and should include all the necessary metadata, or indicate which other files contain important metadata. A readme should articulate your datasets relationships with other files (e.g., other data files, code, a manuscript). It ensures that the data can be interpreted by researchers as it includes the contents and structure of the dataset. A readme file might also be where you articulate your variable level metadata, for example, what does the column labeled var 1 contain? This information may alternatively be recorded in a separate data dictionary, codebook or other documentation format. The details of how variable level metadata are captured are often discipline specific, you can review a Disciplinary Metadata repository to learn more about your own disciplinary standards or others.
Review this example readme file: https://doi.org/10.5061/dryad.j0t179b. Following the image the left, open the drop down to reveal the files associated with the data. Review the readme files to see how the authors have provided guidance to interpret their data set. Note that the file structure allows you to associate the appropriate .csv file with its related readme and the readme files help interpret the related .csv file.