What is Open Data?

In the digital age, data is the raw material on which discoveries are built. Data plays a central role in our ability to predict and counter natural disasters, understand human biology, and develop advances in computing technology. Whether in the Life Sciences or the Social Sciences, unfettered access to research data is crucial to accelerating progress in research.

Despite its tremendous importance, research data still remains largely fragmented—isolated across millions of individual computers, blocked by technical, legal, and financial restrictions.

The amount of scientific and scholarly data grows exponentially each year, yet we still lack the infrastructure, policies, and practices to harness this vital resource. While some high profile projects—such as the Human Genome Project and the Large Hadron Collider—make their data openly accessible, too often data isn’t shared beyond those who generate it. The Internet was built by researchers to share data, but data sharing isn’t yet the norm in research.

Text Adapted from Setting the Default to Open, by SPARC Europe used under a CC-BY License.

Open Data: an Introduction

To understand what is meant by open data, start by watching this quick video introduction.

“Open Data” by simpleshow foundation

Let’s start with a simple definition and break this down even further.

“Open data is data that can be freely used, re-used and redistributed by anyone – subject only, at most, to the requirement to attribute and sharealike.”

From the Open Data Handbook https://opendefinition.org/

The Open Data Handbook further highlights these critical points:

  • Availability and Access: The data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form.
  • Reuse and Redistribution: The data must be provided under terms that permit reuse and redistribution, including the intermixing with other datasets.
  • Universal Participation: Everyone must be able to use, reuse, and redistribute. There should be no discrimination against fields of endeavour or against persons or groups. For example, ‘non-commercial’ restrictions or restrictions of use for certain purposes (e.g., only in education) are not allowed.

As you can see in the definition above, open data is much more than making data available. It requires an established workflow for collecting, managing and storing that data with standards for making that data open. Open data ensures public access to data and should include sufficient details so that others know how that data can be reused or repurposed. By using the FAIR principles, described next, to manage your data at all stages of the research process you will help ensure your data is both future-proof and open.

The FAIR Principles

The FAIR principles for sharing data, meaning that data should be findable, accessible, interoperable and reusable (Wilkinson et al., 2016), provide guidance for scientific data management and stewardship.

FAIR principles by CSC — Tieteen tietotekniikan keskus / CSC — IT Center for Science

The Value of Open Data

Watch the following TedX talk of Kristin Briney, a Data Services Librarian at the University of Wisconsin-Milwaukee, who discusses some of the reasons open data can be so valuable in research and society. 

“Rethinking Research Data” from TEDx Talks.


Test Your Knowledge