5. Archiving and Sharing

Contents on this page:

Open Data and Data Sharing
Archiving and Publishing Research Data
Check that the Data can be Shared
Choosing a Data Repository
Preparing for Archiving and Publishing
Ensuring the Data are FAIR
Depositing Data
Linking the Data to the Publication

Open Data and Data Sharing

Formal sharing covers what is typically called archiving and publishing data (including publishing data as part of an article). Depositing data in established, preferably certified, data repositories ensures good practice for long-term preservation, metadata management, and findability, and enables transparent citation, reuse, and validation.

Within formal channels, degrees of accessibility and reusability vary and are largely governed by legal (e.g., privacy legislation and data security, copyright and ownership) and research-ethical considerations (e.g., vulnerable groups, missing consent, misuse). In other words, not all research data can be made freely available to the public. Some data can never be shared; others must be anonymised before sharing, or can only be shared under specific conditions—such as restricted access, data processing agreements, purpose-of-use restrictions, or embargo.

The counterpart to restricted data is open data—data made available with the technical and legal characteristics needed for anyone to use and redistribute them, at any time and from anywhere.

Informal sharing refers to other methods where data are exchanged directly between researchers or research groups outside established archiving solutions. Examples include sharing via email, personal communication, networks, or collaboration platforms that do not guarantee long-term preservation, searchability, or access control. As with repository deposit, it is essential to follow legal and research-ethical requirements even when sharing in a less formal capacity.

Solutions for sending data (informal sharing)

Data shared informally—whether within a project or with colleagues—should, like storage and archiving, be transferred securely and in compliance with legislation, ethical guidelines, and general data security. This concerns not only how and where files are shared, but also whether they should be shared at all.

FileSender

Sikt has developed a secure and reliable solution for sharing both small and large files with other researchers. FileSender lets you control who can download a file, set your own download timeframe, and obtain reports with download statistics. The service is free of charge.

Get access to and read more about FileSender.

Archiving and Publishing Research Data

Sharing research data offers several important benefits for the researcher, the research community, and society at large:

Increased visibility
Better use of resources and the enabling of re-analyses
Strengthened reliability and verifiability of research results and, with that, the quality of the publication
Opportunities to find new collaborators
Long-term preservation that ensures the data do not disappear

In addition to the benefits mentioned above, there are now increasingly stringent requirements and expectations from national authorities, research funders, and journals that research data be made openly available under the principle “as open as possible, as closed as necessary.”

Check that Your Data can be Shared

Before you publish or archive your research data, it is important to assess whether you are allowed to share the data, and whether doing so is responsible. There may be legal, research-ethical, security-related, contractual obligations, or commercial considerations that limit the possibility of sharing.

Personal data and sensitive data

Research data that contain personal data or sensitive data, for example confidential information or special categories of personal data, cannot be published openly. However, some of these data may be shared with restricted access, provided that the necessary safeguards are in place.

Sharing anonymized data

Anonymized data, i.e. data that contain neither directly nor indirectly identifying information, are not covered by the EU’s privacy regulation (GDPR) or the Norwegian Personal Data Act. Fully anonymized data can be shared openly, provided other legal and research-ethical issues are cleared. Anonymization must not be confused with de-identification and pseudonymization.

Ownership and copyright

If you use pre-existing data from a third party, you must comply with the licensing and terms of use under which the data were made available. These will determine whether the data can be published openly or shared further with colleagues and students. The same applies to other copyrighted material included in the data (e.g. images).

Research-ethical considerations

Even if the research data can be shared legally, there will be cases where publishing the data would be considered ethically unjustifiable. This depends on many factors, but such an example could be in research on vulnerable or marginalized groups.

For more information on ethical sharing of research data, see the CARE Principles.

Does your Project Contain Personal Data?

As a general rule, data containing personal data must be deleted or anonymized at the end of the project.

For qualitative data, we recommend obtaining participants’ explicit consent for the limited publication of indirectly identifying personal data. This is because full anonymization of qualitative data can be difficult, time-consuming, and may reduce the dataset’s value for future research. If participants have consented to the sharing of indirectly identifiable data under specific conditions, the dataset may be published without complete anonymization.

For research data that contain directly identifying information, we recommend depositing the data with Sikt. Please note that Sikt does not accept already anonymized qualitative data, as they perform their own anonymization.

Choosing a Data Repository

When selecting a data repository, there are several questions you should consider:

Data Repository, pulldown

In some cases, the research funder will specify which repository the data must be deposited in. For example, certain projects funded by the Research Council of Norway may be required to archive research data in a specific repository; this will be included as part of the project contract.
When choosing a data repository, you should first check whether there is a domain-specific repository within your discipline. Such repositories are often better adapted to the distinctive characteristics of the data typically collected in the field. At the same time, think about your target audience—if you aim to reach researchers in your own field, a domain-specific data repository will likely have greater visibility, be relevant to more people, and therefore have more impact than depositing the data in a general, often more extensive repository.

If no domain-specific option exists for your field, or you want to reach a broader audience, choose a more general, cross-disciplinary data repository.
When assessing the reliability and quality of a data repository, the simplest approach is to look for certification. We recommend choosing repositories with a CoreTrustSeal certification. Other certifications also exist, such as the Data Seal of Approval (DSA), the Nestor Seal, and ISO 16363. Repositories with such certifications are often referred to as “Trusted Digital Repositories” (TDR).

Be aware that there are several reasons why a data repository may not hold a formal certification, and this is not necessarily problematic. There are several examples of repositories with a strong reputation and standing in the research community that are not certified.

If the repository is not certified, you will have to rely more on your own assessment—based, for example, on reputation, alignment with the FAIR principles (point 3), ownership and operation of the repository, funding, and so on.

See DCC’s guide for self-assessment of a data repository’s trustworthiness.
The data repository should meet as many of the FAIR principles as possible, though some are more critical than others.

Read more about the FAIR principles.
Not all repositories allow every type of license or restrictions such as limited access. If the repository does not support the license you intend to use, or a license with equivalent attributes, you should choose another one.
Does the repository specify how long the research data will be preserved, and is funding secured for long-term maintenance and operations?
You must check in advance whether there are costs for publishing/depositing data in the repository, as well as the underlying terms of use. If costs apply, plan how these will be financed as part of the project budget.

General data repositories

Below is a selection of commonly used, more general data repositories, along with search resources that can help you find and identify data repositories relevant to your research data.

General data archives, pulldown

DataverseNO is a national, general data repository for open research data. The repository is managed by UiT The Arctic University of Norway on behalf of a national consortium consisting of DataverseNO’s partner institutions. The repository supports the FAIR principles for managing and stewarding research data and is certified with CoreTrustSeal.

Read more about how to archive in DataverseNO.
NIRD RDA is operated by SIGMA2 and aims to make research data from Norwegian institutions searchable, accessible, and reusable for at least 10 years.

Read more about depositing data to NIRD RDA.
Archives all types of digital research data on people and society, including data that require special handling or permission, such as personal data. Sikt’s data repository is CoreTrustSeal certified. Data can be published openly or with restricted access.

Read more about Sikt’s services for data archiving.
Open Science Framework is a free, open platform developed by the Center for Open Science (COS) to support open and transparent research. OSF has a dedicated data repository module where you can deposit data from all disciplines.

Read more about Open Science Framework.
Zenodo is an international, general data repository for open research data, developed and operated by CERN through the OpenAIRE project. The repository supports the FAIR principles and makes research available across all disciplines.

Read more about how to archive in Zenodo.

Search resources for finding data repositories

Re3data – Re3data is a global registry of research data repositories funded by the DFG (German Research Foundation). It was established to make it easier for researchers to find relevant data repositories, both generic and discipline-specific.

FAIRsharing – FAIRsharing is an information portal that provides an overview of data repositories, metadata standards, and guidelines for sharing and reusing research data. The platform is developed by the University of Oxford to support the FAIR principles.

Preparing for Archiving and Publication

Before data can be deposited in a data repository and published, they must be quality-assured, structured, and documented. This is a prerequisite for others to find, understand, and reuse the data you share. Below is an overview of steps you should take to prepare the data. Remember to check the requirements set by the data repository.

Note! Most of these steps should be implemented early in the research project to avoid extra work and downstream errors, and to ensure the best possible data quality.

Preparing, pulldown

Files must be organized and structured in a meaningful way with a consistent logic.

Read more about best practices for structuring files.
Before files can be deposited in a data repository, they must be converted or copied into an open, archival file format. This increases the likelihood that the data can be preserved for the future and that as many people as possible can open and reuse the files regardless of software or operating system.

Read more about open file formats.

Write documentation for the data

For your data to be understood and reused by other researchers, it must be clearly documented. This is done by writing a README file.

A well-crafted README file contains up-to-date, detailed information about the data, described concisely and unambiguously. The information should be self-explanatory and, like the rest of the materials to be deposited, it is important that it is written in an open file format (either .txt or .md).

Best practice is to create the README file at the start of the project and keep it updated as changes are made and new files are created. In this way, the file can serve as an overview for project members during and after the project, and it will be complete ahead of deposit in the repository. Place the README file at the top level of the project folder.

DataverseNO has developed a template for README files; provided the data repository you have chosen does not have its own documentation guidelines, you may use this template.

Below is a list of information that will be important to include in a README file:

Essential information

General background information (dataset title, DOI, contact details, date, location, ownership, funder)
Method description (protocol, instruments, software)
Data and file overview
File-specific information
Terms for reuse

Other important information

Descriptions, instructions, and protocols for collection, processing, and analysis steps
Configuration files and log files
Glossaries, codebooks
Variable lists
Participant information sheets and consent forms
Notification form and preliminary assessment from Sikt, any ethical approvals
Questionnaires and interview guides
Permissions and licenses from any rights holders

Ensure that the Data are FAIR

FAIR stands for Findable, Accessible, Interoperable, and Reusable, and is primarily about managing and describing data in ways that enable others to understand and use them in the future. In short, the data should be:

Findable – Data and/or metadata should be easy to locate, for both humans and machines.
Accessible – Once found, humans and machines must be able to access the data and/or metadata.
Interoperable – Data must interoperate with applications or workflows for analysis, storage, and processing.
Reusable – Data and metadata should be well documented so they can be replicated and/or combined in different contexts.

Note! Each of the four categories above has an associated set of principles. In FAIR there are 15 principles in total.

Although the goal is to share data openly and freely, and thereby align with the FAIR principles, not all data can be shared equally openly (or shared at all). The European Commission and the Research Council use the mantra “as open as possible, as closed as necessary” in their data-sharing requirements. The same applies to the FAIR principles; we therefore talk about the degree of FAIRness in terms of how many principles are met.

How FAIR your data can be depends on the content of the data in relation to applicable laws and ethical guidelines, the characteristics of the data and how they are described (e.g., metadata and the README file), as well as the infrastructure the data are placed in. The latter means that choosing high-quality, feature-rich repositories is essential to making your data as FAIR as possible.

Learn more about FAIR and key concepts through the FAIR Aware learning resource.

Depositing Data

As part of the process of depositing in a data repository, the data must be described by registering metadata. If the data are to be made openly available, you must also decide which license the data will be published under.

Description of the data / metadata

Put simply, metadata are “data about the data,” i.e., structured information that describes your data. Detailed, high-quality metadata are a key part of making your data FAIR. Examples of metadata include information such as:

Who produced the data and their affiliation
Keywords and subject area
What types of data the files contain
File types
License and terms of use

By registering this kind of information in machine-readable metadata schemas, search and discovery are enabled, and researchers and systems receive the contextual information needed to reuse the data. To ensure that metadata are understandable and interoperable across systems, various metadata standards have been developed—generic ones that suit all disciplines (e.g., Dublin Core and DDI) and discipline-specific ones tailored to particular fields. Most data repositories specify which metadata must be registered.

Below is a selection of overviews listing discipline-specific metadata standards:

In addition to metadata standards, there are controlled vocabularies (and ontologies) that provide standardized definitions for key concepts within one or more disciplines. There is a wide range of vocabularies that can be used to describe data, some are part of a metadata standard, others are stand alone. Wherever possible (and if the data repository supports it), standardized terms should be used and the corresponding URLs should be provided.

Examples of vocabularies:

Choosing a license

When publishing data openly in a data repository, you must decide which license and conditions will apply. There are several options, but the most commonly used for datasets are Creative Commons (CC) and Open Data Commons (ODC). Each framework offers a set of licenses that differ in the degree of freedom and restrictions imposed. It may also occur that repositories/institutions use custom licenses.

If the data are to be archived with restricted access, you must check whether the data repository is sufficiently secure and has the required functionality. For research data containing personally identifying information, we recommend depositing the data with [Sikt].

Link the Data to the Publication

When you have deposited the research data underlying a scholarly publication, it is common to include a statement in the text, a “data availability statement” or “data access statement”. In this statement, you provide, among other things, the full data citation with a persistent and unique identifier (e.g. a DOI). In this way, a link is created between the publication and the data.

- 4. Retrieval and reuse
  Go to the previous step.
  Go to the previous step.
- Research Support from the Library
  Back to Research Support from the Library.
  Back to Research Support from the Library.