Data repositories ensure a persistent location where the data file(s) and associated documentation (i.e., metadata) can be archived and preserved.
Data could be published in either an institutional or disciplinary repository. An example of an institutional repository is UBC’s Scholars Portal Dataverse, where any researcher can create an account and deposit data. Its required metadata template helps to enhance discovery of data. The Registry of Research Data Repositories includes many disciplinary repositories, and you can search by subject to find a resource relevant to your field.
Your choice of repository will depend in part on why you’re depositing your data. Is it open to support a publication, open to encourage its re-use as a standalone dataset, or open to contribute to a larger dataset like a gene databank or a databank of biological observations?
Regardless of the route you choose, make sure the repository will provide a persistent identifier or DOI (Digital Object Identifier), which provides a unique identifier and helps others to easily locate your files. Unlike a URL, which can be unstable and make lead to broken links over time, a DOI will never change.
Scenario – Data persistence
Marija published an article in the journal Conservation Biology and referenced related data sources and a database with citations that could be downloaded from her institutional faculty page. This worked until Marija left that university and the faculty page was disabled. Subsequently, if readers sought the related data to the article in Conservation Biology they were now directed to an “Access forbidden” page meaning any data or related sources to the article could not be found.
If the institution had its own data repository Marija could deposit that data and related files there. This would provide a persistent DOI, meaning Marija’s data should be accessible for a long time no matter where she is based.
Scenario adapted from Case study: Data persistence with permission from Standford Libraries.
Generalist data repositories
Choosing a generalist data repository is an option if there is no discipline specific repository in your field. Below are some common generalist repositories:
- Scholars Portal Dataverse: A publicly accessible data repository, open to affiliated researchers (primarily from Canadian universities) to deposit and share research data. It is hosted by the University of Toronto libraries. If you have questions about this repository, contact email@example.com.
- Dryad: A curated resource that provides a home for a wide variety of data types.
- Figshare: Cloud-based and features the ability to preview data.
- Zenodo: Does not impose any requirements on the format, size, access restrictions, or license. All data is licensed CC0.
What about OSF?
As an open workflow tool, OSF is a good option when working with raw data or temporarily storing the analyzed data, but it is not ideal for long term preservation and storage. You could include a data component in OSF by adding your repository DOI within your readme. The publishing option in OSF also allows for flexibility in how you provide access to your project’s data. However, repositories are a better option as they provide: more granular and data specific options, the ability to assign user access at the file level, and long term preservation.
Licensing Open Data
Applying a license when depositing into a repository is an often neglected step. But without a license, no one knows how they can use the data and how they are expected to give credit for your hard work. Depending on the the data repository, there may be a default license. Be sure to check the license options and select an open license that suits how you want to share your data.
Choosing a License
Different types of open licenses make sense for different aspects of open scholarship and data. Funding agencies and journals may require your data be made open, check whether they indicate a specific license be applied to your data. If you are choosing you own licensing option, select the appropriate one based on how you want others to reuse your data. It may be helpful and is a good practice to include a rights statement within your dataset or in your readme file. Here are two resources that can help you compare and select the appropriate data license for your work:
Test Your Knowledge
- How to choose a license for open scientific data and code The first 4 minutes of this video covers applying a license to data.