2. Storing

Content on this page

Introduction
Storage Services
Organisation and Documentation
- Version Control
- File Naming
Documentation of Data during the Project

Introduction

Ensuring that research data are stored securely is critical for maintaining sound research ethics and legal compliance. Depending on the nature of the data, different requirements apply to the security of storage solutions. Sensitive data or large data volumes require specialized solutions for secure intermediate and long-term storage. Personal data must never be stored on personal devices, and a security assessment of the storage solution must always be carried out.

As a researcher, you must therefore familiarize yourself with your institution’s information security guidelines and know which storage services are approved for the data your project handles, as well as whether the service provides sufficient functionality. It is important to plan storage and file organisation before data collection begins. A valuable aid at this stage is to prepare a data management plan (DMP) for the project.

On this page you will find information about a selection of relevant services for storage, along with tips and advice on organisation and documentation.

Storage Services

Below we have compiled an overview of storage services that address specific needs, such as secure storage or storage for large data volumes.

Storage, pulldown

TSD meets the strictest legal requirements for the processing and storage of sensitive research data. The service provides researchers with a dedicated virtual project area (a closed environment), functionality for integration with Nettskjema and the ability to work remotely from anywhere in the world. Upon request, it is also possible to get access to high-performance computing resources within TSD for processing large datasets. The service is intended exclusively for the collection, analysis, and storage of sensitive data.

TSD is provided by UiO, access and use involve costs that must be covered by the project funds. Pricing is modular and will depend on the specific needs of each project.

Read more about TSD on UiO’s help pages.
NIRD is a cloud-based research platform for projects that need secure storage for larger datasets (1 TB or more). Access to NIRD is limited, and projects wishing to use the platform must therefore apply for allocations of storage and computing resources (high-performance computing) through the regular calls. If capacity is available, it is also possible to apply between calls. NIRD is provided by SIGMA2, which is part of SIKT.

Read more about NIRD and options for storage space.

Organization and Documentation

Proper routines for organizing and documenting data are fundamental to running a research project efficiently and in a well-structured way, and in order ensuring that the data can be reused in the future—either for your own purposes or if the data are to be shared openly. Verifiability, replication, and other forms of reuse require that the file contents can be read and understood also by those outside the project who were not involved in the data collection themselves.

Leaving structure and documentation as an afterthought when the need arises is often time-consuming and inconvenient. Below you will find a set of points that should therefore be addressed early in the project.

Folder Structure

Finding a logical and efficient structure for organizing files can be challenging, especially in projects with multiple contributors. At the same time, the principles for retrieval on which you base the organization, will depend on the project’s specific needs. In the initial phase of each research project, you should therefore establish—and agree on—one common folder structure. Here we have collected some advice for organizing files and information:

Plan and agree on a shared structure for organizing files and folders early in the project, and at the latest before you begin data collection.
Create a folder hierarchy — The key is that the hierarchy has a logical division and that this division is consistent. It’s usually a good idea to go from general to specific, i.e. the top level in the hierarchy is the project folder itself, followed by top-level categories that each contain related subfolders.
Meaningful and unambiguous folder names — All folder names should describe the files they contain, while avoiding ambiguous and overlapping categories. Generic folder names or multiple folders serving the same function should always be avoided.
Avoid a hierarchy that is too shallow or too deep — The former leads to folders containing unwieldy amounts of files without a logical, guiding subdivision, the latter creates overly detailed subdivisions that make it hard to navigate and find the correct file. There is no single correct number of levels, but limiting yourself to 3–4 is a good rule of thumb.
Document the folder hierarchy, folder contents, and file-naming conventions in a README.txt file.
Files must never be stored twice in different folders — As an exception, you may create a shortcut that points to the original file if needed. Duplicated files within the same folder should have different version numbers (see version control for more information).

Version Control

During a research project, you will often, for various reasons, create new versions of files, and several of those files may have nearly identical names. To prevent you or others in the project from working on the wrong files or overwriting important data, you should establish a system for keeping track.

Here are two simple measures you can take to practice good version control:

Version Control, pulldown

As a general rule, files should be numbered. In principle, the numbering can consist of any combination of letters and numbers, but a common scheme is, for example: v1, v2, v2-1, where the first nVerumber represents major changes and the number after the hyphen represents minor changes. In collaborative projects, it can also be wise to add initials after the version number to indicate who last made a change, for example: v2-1MH.
In addition to a version number in the file name, it can be useful to create a change log table for files that are likely to undergo revisions during the project. A change log table can include information such as:

File name

Date the file was created

Person responsible for the file

Version number

Description of the change made

Date of the change

Who made the change

File Naming

File naming within a project should be clear and consistent, it is therefore advantageous to establish a dedicated naming convention. The aim is to make it easier to structure, search, distinguish between, and understand a file’s contents based on its name. At the same time, this ensures that the file name is compatible with most software, archives, and storage services.

All files in the project should follow the same naming convention.
The file name should be concise and no longer than 32 characters.
Do not use spaces, use underscores “_” instead (e.g. my_file.txt).
The name should be lowercase. Alternatively, you may use the naming conventions camel-case or pascal-case. In camel-case, the file name begins with a lowercase letter, and each subsequent word begins with a capital letter (e.g. myFile.txt). Pascal-case is the same as camel-case, except that the file name—and each subsequent word—begins with a capital letter (e.g. MyFile.txt).
Do not use special characters (e.g. «!&%$#@/).
A period should be used only once, at the end of the file name before the extension (e.g. “.txt”).
To enable sorting by date, it is useful to include the date in inverted form without spaces (e.g. 20250227). To sort by date, the date must be placed at the beginning of the file name or immediately after the first shared part of the file name used across the project’s files.
If you number files, you should add leading zeros depending on the project’s total number of files (e.g., 01 or 001). This is important so that files appear in the correct order when sorting by name. Name sorting is read symbol by symbol, so the number 10 will be read as 1 and 0 and will therefore be treated as a lower value than the number 2.
You may also consider including the change made to the file in the file name, e.g. “sorted” or “cleaned.”
If the project files go through many iterations, it is wise to include version control in the file name. See the explanation of version control here.

File Formats

To ensure that files can be opened in the future, you should choose open and non-proprietary file formats whenever possible. Although there is no guarantee that a specific format will last indefinitely, open formats are far more likely to remain usable even after the company or software they were created for ceases to exist. Open file formats are maintained and documented through freely available international standards.

As far as possible, you should also avoid relatively unknown or niche file formats. In cases where non-proprietary formats do not exist or cannot be used, choose the most widely adopted alternatives.

Using open and non-proprietary formats is especially relevant when data will be stored for a long time—whether for your own research or when the data will be shared or archived. Note that not all proprietary software necessarily supports exporting to open formats. In such cases, it may be worthwhile to look for open-source software with equivalent functionality as an alternative.

To ensure long-term accessibility and reusability, you should choose file formats that are:

Open and non-proprietary.
Easy to share and reuse, now and in the future. This means the files can be used by others who may rely on different software or operating systems.
Able to preserve as much information from the original files as possible, i.e. formats that avoid compression wherever feasible.
Not vulnerable to becoming obsolete when new versions are introduced.
Supported by the data repository in which you plan to deposit the data.

See the complete overview of recommended open/archival file formats.

Documentation of Data during the Project

To maintain an ongoing overview of the data and files in the research project, you should create a text file at the outset.

The README file is a central part of the accompanying documentation for research data deposited in a data repository. By creating the file early in the project, you ensure your own orderly management of the data (regardless of whether the data will be shared) and reduce additional work if the data are later archived/published.

- 1. Planning
  Go to the previous step.
  Go to the previous step.
- 3. Collection and Processing
  Go to the next step.
  Go to the next step.