Reusing existing data
Before starting your project, check whether the data you need already exist. In recent years there has been an exponential increase in the volume of data generated by researchers, governmental and non-profit organizations, as well as industry. You might want to consider reusing existing data, rather than producing new research data yourself.
Why reuse existing data
For example, because it:
- Avoids unnecessary duplication of efforts to generate research data.
- Allows you to integrate data from different studies, sites, labs, disciplines etc., and thus to open up important new avenues of research.
- Eases the burden on over-researched populations.
How to find existing research data?
Finding research data can prove challenging given the amount of data that is currently available, and the diversity of possible data sources and formats.
Research data that are shared following the FAIR principles (with a persistent identifier and rich metadata online in a searchable resource) are more easily findable than data published on personal websites, or data included as supplementary material to a journal article.
Places where you can look for data include:
- Research data repositories listed in the registries re3data.org or FAIRsharing.org
- The index provided by DataCite, which collects metadata for DOIs assigned to research data
- The explore portal of OpenAIRE, which collects metadata about all kinds of objects of the research lifecycle, including research data
- Data papers: they provide peer-reviewed descriptions of publicly available datasets or databases with significant reuse potential, and include links to the associated data records in the repositories hosting the data. Data papers are published in dedicated data journals (e.g. Scientific Data, Journal of Open Psychology Data etc.), or as a particular article type in more conventional journals.
The legal aspects of reusing data: access and use conditions
When reusing existing data, you have to check and comply with the conditions of access and use. You may not be allowed to do whatever you want with the data, e.g. because of confidentiality or protection by intellectual property rights.
- Some data will be available as open research data, meaning that they can be freely accessed and used.
- Some data will fall under the category of restricted data: in this case, specific access and/or use conditions are in place (also see degrees of data sharing).
Terms and conditions of access and use depend on the nature of the data. At best these terms are made explicit in advance by means of a clearly specified access category and a license or user agreement. If not, you will have to find out yourself whether and how you can access and use the data, and obtain any permissions needed.
Citing the data in your publications is often included in the access and use conditions. But acknowledging the work of others is also simply part of research integrity. So make sure to always cite any existing research data you reuse!
More information
-
K. Gregory et al. (2018), Eleven quick tips for finding research data, PLoS Comput Biol 14(4)
- CESSDA ERIC, Access, use and cite data
- OpenAIRE, Can I reuse someone else’s research data?
- OpenAIRE, Data reuse stories & use cases
- SURF's report on The legal status of raw data contains a brief guide to determine to what extent you need permission from the holder(s) of any copyright or database right in the data (section 1.5, pp. 7-8). Keep in mind that it is based on Dutch law, however.