Open Data and Data Sharing
Formal sharing covers what is typically called archiving and publishing data (including publishing data as part of an article). Depositing data in established, preferably certified, data repositories ensures good practice for long-term preservation, metadata management, and findability, and enables transparent citation, reuse, and validation.
Within formal channels, degrees of accessibility and reusability vary and are largely governed by legal (e.g., privacy legislation and data security, copyright and ownership) and research-ethical considerations (e.g., vulnerable groups, missing consent, misuse). In other words, not all research data can be made freely available to the public. Some data can never be shared; others must be anonymised before sharing, or can only be shared under specific conditions—such as restricted access, data processing agreements, purpose-of-use restrictions, or embargo.
The counterpart to restricted data is open data—data made available with the technical and legal characteristics needed for anyone to use and redistribute them, at any time and from anywhere.
Informal sharing refers to other methods where data are exchanged directly between researchers or research groups outside established archiving solutions. Examples include sharing via email, personal communication, networks, or collaboration platforms that do not guarantee long-term preservation, searchability, or access control. As with repository deposit, it is essential to follow legal and research-ethical requirements even when sharing in a less formal capacity.
Solutions for sending data (informal sharing)
Data shared informally—whether within a project or with colleagues—should, like storage and archiving, be transferred securely and in compliance with legislation, ethical guidelines, and general data security. This concerns not only how and where files are shared, but also whether they should be shared at all.
FileSender
Sikt has developed a secure and reliable solution for sharing both small and large files with other researchers. FileSender lets you control who can download a file, set your own download timeframe, and obtain reports with download statistics. The service is free of charge.
Archiving and Publishing Research Data
Sharing research data offers several important benefits for the researcher, the research community, and society at large:
- Increased visibility
- Better use of resources and the enabling of re-analyses
- Strengthened reliability and verifiability of research results and, with that, the quality of the publication
- Opportunities to find new collaborators
- Long-term preservation that ensures the data do not disappear
In addition to the benefits mentioned above, there are now increasingly stringent requirements and expectations from national authorities, research funders, and journals that research data be made openly available under the principle “as open as possible, as closed as necessary.”
- National strategy for making research data available and for data sharing (Government of Norway)
- Making research data available: Policy of the Research Council of Norway (Research Council of Norway)
- Data and software availability (F1000)
- Reporting standards and availability of data, materials, code, and protocols (Nature)
Check that Your Data can be Shared
Before you publish or archive your research data, it is important to assess whether you are allowed to share the data, and whether doing so is responsible. There may be legal, research-ethical, security-related, contractual obligations, or commercial considerations that limit the possibility of sharing.
Personal data and sensitive data
Research data that contain personal data or sensitive data, for example confidential information or special categories of personal data, cannot be published openly. However, some of these data may be shared with restricted access, provided that the necessary safeguards are in place.
Sharing anonymized data
Anonymized data, i.e. data that contain neither directly nor indirectly identifying information, are not covered by the EU’s privacy regulation (GDPR) or the Norwegian Personal Data Act. Fully anonymized data can be shared openly, provided other legal and research-ethical issues are cleared. Anonymization must not be confused with de-identification and pseudonymization.
Read more about anonymizing data in research projects on Kristiania’s privacy guide.
Ownership and copyright
If you use pre-existing data from a third party, you must comply with the licensing and terms of use under which the data were made available. These will determine whether the data can be published openly or shared further with colleagues and students. The same applies to other copyrighted material included in the data (e.g. images).
Research-ethical considerations
Even if the research data can be shared legally, there will be cases where publishing the data would be considered ethically unjustifiable. This depends on many factors, but such an example could be in research on vulnerable or marginalized groups.
For more information on ethical sharing of research data, see the CARE Principles.
Does your Project Contain Personal Data?
For qualitative data, we recommend obtaining participants’ explicit consent for the limited publication of indirectly identifying personal data. This is because full anonymization of qualitative data can be difficult, time-consuming, and may reduce the dataset’s value for future research. If participants have consented to the sharing of indirectly identifiable data under specific conditions, the dataset may be published without complete anonymization.
For research data that contain directly identifying information, we recommend depositing the data with Sikt. Please note that Sikt does not accept already anonymized qualitative data, as they perform their own anonymization.
Read more at Sikts arkivtjenester.
Note! At Kristiania, Sikt serves as an advisor on privacy in research and assesses the project’s legality under the EU General Data Protection Regulation (GDPR) and the Norwegian Personal Data Act. If you plan to archive research data that contain personal data (directly or indirectly personally identifiable), you must always disclose this in the notification form (meldeskjemaet).
Choosing a Data Repository
When selecting a data repository, there are several questions you should consider:
Data Repository, pulldown
General data repositories
Below is a selection of commonly used, more general data repositories, along with search resources that can help you find and identify data repositories relevant to your research data.
General data archives, pulldown
Search resources for finding data repositories
- Re3data – Re3data is a global registry of research data repositories funded by the DFG (German Research Foundation). It was established to make it easier for researchers to find relevant data repositories, both generic and discipline-specific.
- FAIRsharing – FAIRsharing is an information portal that provides an overview of data repositories, metadata standards, and guidelines for sharing and reusing research data. The platform is developed by the University of Oxford to support the FAIR principles.
Preparing for Archiving and Publication
Before data can be deposited in a data repository and published, they must be quality-assured, structured, and documented. This is a prerequisite for others to find, understand, and reuse the data you share. Below is an overview of steps you should take to prepare the data. Remember to check the requirements set by the data repository.
Note! Most of these steps should be implemented early in the research project to avoid extra work and downstream errors, and to ensure the best possible data quality.
Preparing, pulldown
Write documentation for the data
For your data to be understood and reused by other researchers, it must be clearly documented. This is done by writing a README file.
A well-crafted README file contains up-to-date, detailed information about the data, described concisely and unambiguously. The information should be self-explanatory and, like the rest of the materials to be deposited, it is important that it is written in an open file format (either .txt or .md).
Best practice is to create the README file at the start of the project and keep it updated as changes are made and new files are created. In this way, the file can serve as an overview for project members during and after the project, and it will be complete ahead of deposit in the repository. Place the README file at the top level of the project folder.
Below is a list of information that will be important to include in a README file:
Essential information
- General background information (dataset title, DOI, contact details, date, location, ownership, funder)
- Method description (protocol, instruments, software)
- Data and file overview
- File-specific information
- Terms for reuse
Other important information
- Descriptions, instructions, and protocols for collection, processing, and analysis steps
- Configuration files and log files
- Glossaries, codebooks
- Variable lists
- Participant information sheets and consent forms
- Notification form and preliminary assessment from Sikt, any ethical approvals
- Questionnaires and interview guides
- Permissions and licenses from any rights holders
Ensure that the Data are FAIR
FAIR stands for Findable, Accessible, Interoperable, and Reusable, and is primarily about managing and describing data in ways that enable others to understand and use them in the future. In short, the data should be:
- Findable – Data and/or metadata should be easy to locate, for both humans and machines.
- Accessible – Once found, humans and machines must be able to access the data and/or metadata.
- Interoperable – Data must interoperate with applications or workflows for analysis, storage, and processing.
- Reusable – Data and metadata should be well documented so they can be replicated and/or combined in different contexts.
Note! Each of the four categories above has an associated set of principles. In FAIR there are 15 principles in total.
Although the goal is to share data openly and freely, and thereby align with the FAIR principles, not all data can be shared equally openly (or shared at all). The European Commission and the Research Council use the mantra “as open as possible, as closed as necessary” in their data-sharing requirements. The same applies to the FAIR principles; we therefore talk about the degree of FAIRness in terms of how many principles are met.
How FAIR your data can be depends on the content of the data in relation to applicable laws and ethical guidelines, the characteristics of the data and how they are described (e.g., metadata and the README file), as well as the infrastructure the data are placed in. The latter means that choosing high-quality, feature-rich repositories is essential to making your data as FAIR as possible.
Read more about the FAIR principles and each of the subordinate elements.
Learn more about FAIR and key concepts through the FAIR Aware learning resource.
Depositing Data
As part of the process of depositing in a data repository, the data must be described by registering metadata. If the data are to be made openly available, you must also decide which license the data will be published under.
Description of the data / metadata
Put simply, metadata are “data about the data,” i.e., structured information that describes your data. Detailed, high-quality metadata are a key part of making your data FAIR. Examples of metadata include information such as:
- Who produced the data and their affiliation
- Keywords and subject area
- What types of data the files contain
- File types
- License and terms of use
By registering this kind of information in machine-readable metadata schemas, search and discovery are enabled, and researchers and systems receive the contextual information needed to reuse the data. To ensure that metadata are understandable and interoperable across systems, various metadata standards have been developed—generic ones that suit all disciplines (e.g., Dublin Core and DDI) and discipline-specific ones tailored to particular fields. Most data repositories specify which metadata must be registered.
Below is a selection of overviews listing discipline-specific metadata standards:
In addition to metadata standards, there are controlled vocabularies (and ontologies) that provide standardized definitions for key concepts within one or more disciplines. There is a wide range of vocabularies that can be used to describe data, some are part of a metadata standard, others are stand alone. Wherever possible (and if the data repository supports it), standardized terms should be used and the corresponding URLs should be provided.
Examples of vocabularies:
Choosing a license
When publishing data openly in a data repository, you must decide which license and conditions will apply. There are several options, but the most commonly used for datasets are Creative Commons (CC) and Open Data Commons (ODC). Each framework offers a set of licenses that differ in the degree of freedom and restrictions imposed. It may also occur that repositories/institutions use custom licenses.
If the data are to be archived with restricted access, you must check whether the data repository is sufficiently secure and has the required functionality. For research data containing personally identifying information, we recommend depositing the data with [Sikt].
Read more about sensitive and personal data and whether these can be archived.
Link the Data to the Publication
When you have deposited the research data underlying a scholarly publication, it is common to include a statement in the text, a “data availability statement” or “data access statement”. In this statement, you provide, among other things, the full data citation with a persistent and unique identifier (e.g. a DOI). In this way, a link is created between the publication and the data.
Research Support from the Library
Back to Research Support from the Library.Back to Research Support from the Library.