Using a data repository
The preferred option for data preservation and sharing is to deposit data in an established, trustworthy research data repository (sometimes also called a data center, data archive or scientific database), whenever appropriate.
Benefits of using a data repository
Using data repositories for data preservation and sharing has several advantages, such as:
- They take away the burden of handling data reuse queries and managing data access.
- They offer more guarantees in terms of sustainable access to and secure storage of data.
- They make your data discoverable and citable.
- They can go a long way towards making your data FAIR (Findable, Accessible, Interoperable and Reusable), thereby increasing the chance that your data can be discovered and reused.
What is a data repository?
A data repository is an online platform that is used to deposit completed datasets with the purpose to publish, share and/or preserve them. A data repository is database infrastructure that compiles, manages and gives access to data and associated metadata and documentation.
Personal websites and databases as well as cloud storage services (Dropbox, Google Drive, etc) are not considered repositories.
Data repository types
There are different kinds of data repositories, including:
- General-purpose repositories: accept a wide range of data types (and sometimes other research outputs as well) from all disciplines. Examples are:
- Zenodo
- Open Science Framework (single sign-on with Ghent University credentials possible)
- Dryad
- Domain-specific repositories: focus on specific data types or data from specific research domains.
- Institutional repositories: hold research data outputs from a particular research institution.
Whenever available, it is preferred to use domain-specific repositories as that increases the findability of the data by other researchers as well as facilitates the sharing of standardized data and metadata tailored to the scientific domain.
Trusted data repositories
Generally speaking, it is preferable not to adopt a DIY approach, but to share research data via a data repository. Even better is to share via a trusted or trustworthy data repository, if one is available for your research area.
Trusted data repositories meet the following characteristics:
- Provide broad, unbiased and ideally open access to the repository’s content, respecting legal and ethical limitations.
- Assign persistent identifiers to the content for referencing and citing.
- Manage metadata to enable discovery, reuse and citation and provide information about provenance and licensing. Metadata are machine-actionable and standardized.
- Ensure preservation of the repository’s content, also in the long term.
- Offer expert curation, guidance and/or quality assurance services for accuracy and integrity of datasets and metadata.
- Provide explicit information about policies.
- Run services, mechanisms and/or provisions to secure the integrity and authenticity of the repository’s content and to prevent unauthorized access and release of content.
According to the EC’s Horizon Europe Annotated Model Grant Agreement, trustworthy repositories belong to one of the below categories:
- They have received certification (e.g. Core Trust Seal, or ISO16363).
- They are domain-specific repositories that are internationally recognized, commonly used and endorsed by the scientific community.
- They are general-purpose repositories or institutional repositories that present the essential characteristics of trusted repositories (as summarized above).
Data repositories in practice
How to select a suitable repository?
There are hundreds of data repositories or archives to choose from. Keep in mind, however, that not all repositories are equivalent. Some repositories focus more on disseminating and making your data visible than on ensuring their preservation in the long term. Some repositories are domain-agnostic, others tailor towards specific research disciplines and/or data types.
Check our research tip on how to preserve and share data in data repositories for guidance on the selection of an appropriate data repository.
Can all data be deposited externally?
Data repositories are mostly suitable for research data that can be publicly shared – although that doesn’t necessarily have to mean sharing in a fully open way (see degrees of data sharing). Some data repositories can cater for data that cannot be made (immediately) available under full open access, for example by allowing temporary embargoes, or by offering more restricted or controlled levels of access.
However, sometimes it may not be possible or not appropriate to deposit data in an external repository, e.g. for legal, ethical, contractual, practical, or other restrictions on data sharing. In such cases, research data selected for preservation will need to be kept in-house.
Preparing data for preservation and sharing
Keeping data findable, understandable and effectively reusable, either for preservation or sharing purposes, requires some preparation and effort on your part. Typically, these efforts involve the following actions:
- Keep your data organized and provide documentation (e.g. readme file) on your folder structure and file naming
- Where needed, convert data to file formats suitable for sustainable access
- Prepare a data package: accompany your organized data files in the appropriate format with sufficient documentation and metadata
- Set up the necessary rights and/or permissions for accessing the data that is being preserved and/or shared.
More information
- Whyte, A (2015), Where to keep research data. DCC checklist for evaluating data repositories
- Science Europe (2018), ‘Practical Guide to the International Alignment of Research Data Management’. This guide contains a section on 'Criteria for the selection of trustworthy repositories'.
- CESSDA, Data publishing routes