Sharing data
In the context of RDM, data sharing refers to the practice of publicly sharing data from completed (parts of) research, i.e. outside your project or research team. It is different from exchanging data with collaborators while your research is active.
Why share data?
Making your finalised data (or snapshots of your data) available to others has a number of benefits, including:
- Increasing transparency of your research
- Accelerating scientific discovery by enabling new (types of) research
- Enhancing the visibility and impact of your research
- Creating new opportunities for collaboration
More and more publishers and journals, funders, and institutions (including Ghent University) expect research data - especially data resulting from publicly funded research - to be shared where possible.
Ghent University's RDM policy framework urges researchers to make relevant research data openly available in a timely manner, unless there are legitimate reasons for (temporarily) restricting data sharing.
Degrees of data sharing
Sharing research data is not an all-or-nothing choice, but a spectrum. It ranges from making data fully open on one end, to keeping them fully closed on the other, with various possible forms of restricted/controlled access in-between.
Open research data
Data that can be 'freely used, modified and shared by anyone for any purpose' (opendefinition.org).
This is the preferred option when no legitimate opt-out (i.e. restrictions on data sharing) exists.
Closed data
Data that are temporarily under embargo, or that cannot be shared at all.
In this case, it is often still possible to share the metadata of the data, so that you are transparent about which data was created and to avoid duplication of research.
Restricted/controlled data
Data that are not shared in a fully open way, but made available under more restricted access and use conditions. This means that there are limits on who can access and use the data, how, and/or for what purpose.
Data repositories can offer the possibility to deposit your data under restricted/controlled access. In some cases, they even facilitate data access requests and/or data access agreements (e.g. European Genome-Phenome Archive (EGA)).
'As open as possible, as closed as necessary'
Which level of sharing you should choose largely depends on what is appropriate given the nature of your data, and on how well you planned for data sharing (e.g. so that you have the right permissions/consent in place, when applicable).
In any case, there is a growing consensus among research funders, institutions and other stakeholders that access to research data should be 'as open as possible, as closed as necessary'. This principle is included in the European Code of Conduct for Research Integrity (2017), for example, which Ghent University subscribes to.
Restrictions on data sharing
Research data cannot always be shared (immediately) in a fully open way. Sometimes they can only be made available under more restricted conditions and/or after an embargo period, or – in some circumstances – not even at all.
Possible reasons for restricting the sharing of data are:
Personal data
The data constitute or contain personal data, i.e. any information about a (directly or indirectly) identified or identifiable living natural person.
Before sharing personal data, check possible conditions for reuse of personal data.
Otherwise confidential data
You otherwise have a duty or agreed to keep the data confidential (e.g. by signing a non-disclosure agreement, or an agreement containing a confidentiality clause).
Otherwise sensitive data
The data could potentially cause harm (e.g. to endangered species, vulnerable sites or groups, public health, national security...) if made public.
Third-party data
The data are not generated in the course of your own research project, but are supplied to you by another party (e.g. a commercial provider, government agency...).
Data protected by copyright and/or database right of which you are not the (sole) owner
Research data – or rather the form in which they are expressed – may in certain circumstances be protected by copyright and/or database right. For example, data captured in an original textual or audio-visual form, or data creatively selected (from a larger whole), processed, and structured can be protected by copyright. Copying and sharing protected research data in principle requires permission from all rights owners.
Data with commercial/economic valorization potential
The research data may constitute a patentable invention, or contain commercially valuable knowhow. Sharing them (prematurely) could jeopardize your valorization efforts.
Research funders, institutions and reputable journals/publishers with data sharing mandates will normally allow you to opt out of their open data requirements for legitimate reasons such as the above. If you do so, you will often be expected to provide proper justification (e.g. in your Data Management Plan, or in a data availability statement included in your published article).
Ways of data sharing
In principle, there are various ways of sharing data beyond your project or research team, each with their pros and cons. For example, you can:
- Email data upon request
- Make them available via a personal or project website
- Add them as supplementary materials to a journal article
- Share data via a data repository/data archive
The latter option, i.e. sharing data via a data repository, is preferred, as it offers many benefits for you as a researcher, the scientific community and society at large. It is the best option for ensuring that your data are accessible in a sustainable manner.
Preparing data for sharing
Keeping data findable, understandable and effectively reusable requires some preparation and effort on your part (i.e. keeping files organized, migrating files to sustainable formats, preparing a data package with data, documentation and metadata, and having the access rights and reuse permissions in place).
Many repositories have requirements or instructions to deposit data. Check them in advance so you can adequately prepare your data for deposit.
For further guidance on the use of data repositories check our research tip.
Licensing data
When making research data publicly available, it is important to let potential users know in advance what they are allowed to do with those data. Licensing is an effective way to communicate such permissions.
A trusted data repository will normally apply a license to any dataset it holds, which you typically select (from a list of options) when depositing data.
Open research data
Good practice is to apply a standard and open license for open research data, as it ensures legal interoperability and the widest possible reuse.
Among the standard licenses commonly used for research data is the suite of Creative Commons (CC) licenses, which offer different levels of permission. CC licenses conformant with the “Open Definition” are:
- Public Domain Dedication (CC0 1.0): waives copyright and related rights (e.g. databases).
- Attribution (CC-BY-4.0): gives others maximum freedom to reuse (i.e. copy, redistribute, adapt) your work, provided they give appropriate credit.
- Attribution Share-Alike (CC-BY-SA-4.0): same as CC-BY-4.0, but requires redistribution of derivative works under this same license.
Need help selecting an appropriate standard license? Check out this EUDAT license selector tool.
Restricted data
For data requiring access restrictions, a standard license is usually not appropriate. In such cases a bespoke license will be needed instead (e.g. an ‘end user license’ or ‘user agreement’ as implemented by a trusted data repository) to make the data available.
Citing data
Research data can be cited in the same way as publications. In fact, the European Code of Conduct for Research Integrity (2017) stipulates that research data should be acknowledged as legitimate and citable products of research.
Why data citation?
Making your data citable enables you to claim and receive credit for producing high-quality datasets, and enhances the potential impact of your research. When you reuse data from someone else, you should in turn also cite these in your publications to contribute to a culture of data citation.
Data citation requires data to have a persistent identifier (PID), such as a DOI, PURL, or Handle.
How to cite data?
A data citation should contain the following minimum elements:
- Author (creator of the dataset)
- Publication date
- Title
- Version (if applicable)
- Publisher (the organisation hosting/distributing the dataset, i.e. the repository)
- Identifier
Citations can contain additional elements such as resource type and location (a persistent URL for the dataset, e.g. DOI + resolver service). Data repositories will usually suggest the appropriate data citation format for the datasets they hold.
Example
Kavelaars, Marwa M.; Lens, Luc; Müller, Wendt (2019), Data from: Sharing the burden: on the division of parental care and vocalizations during incubation, Dryad, Dataset, https://doi.org/10.5061/dryad.4h16331.
More information
On the reasons for sharing data
- Benefits of managing and sharing data
- A.S. Barnard (2018), Ten reasons to share your data
- M. Astell (2017), Ask not what you can do for open data; ask what open data can do for you
On restrictions on data sharing
- OpenAIRE, How to deal with sensitive data?
- OpenAIRE, How do I know if my research data is protected?
On licensing data
- A. Ball (2014), How to License Research Data
- Open Definition, Conformant Licenses
- ARDC (2019), Research Data Rights Management Guide
- OpenAIRE, How do I license my research data?
On citing data
- ANDS, Data citation
- A. Ball & M. Duke (2015), How to Cite Datasets and Link to Publications