Our data stewardship service provides comprehensive and tailored support to FBM UNIL-CHUV researchers, helping you manage, share, and preserve their research data effectively and in accordance with best practices and policies.
Data Management Plan preparation and reading
Our service assists researchers in creating effective DMPs for SNFS and European grant applications as well as for data storage purposes. This helps you outline how you will collect, manage, and share your data throughout the research process.
Details
A DMP is a crucial document in research projects that outlines how data will be managed throughout its entire life cycle. The aim of a DMP is to provide a structured approach to ensure that data is effectively collected, processed, stored, shared, and preserved in a way that promotes data quality, accessibility, and long-term usability. By creating and following a well-structured Data Management Plan, researchers can enhance the quality of their research, facilitate collaboration, comply with funding agency requirements, and ensure the long-term value and accessibility of their data.
Key components of a Data Management Plan typically include:
Data Description: A detailed description of the data to be collected or generated, including its format, structure, and potential volume.
Data Collection: Information about how the data will be collected, including methodologies, instruments, and tools.
Data Documentation: Plans for documenting the data, such as metadata standards, data dictionaries, and annotations, to ensure that others can understand and use the data.
Data Organization and Storage: Details about how the data will be organized, named, and stored during the project. This may involve considerations of file formats, folder structures, and storage locations.
Data Sharing and Access: Plans for making the data accessible to others, which might involve repositories, embargo periods, access controls, and licensing arrangements.
Data Preservation and Archiving: Strategies for preserving the data beyond the project’s completion, including considerations of data formats, storage options, and potential repositories or archives.
Data Security and Ethics: Measures to ensure data security and ethical handling, such as anonymization, encryption, and compliance with relevant regulations or standards.
Roles and Responsibilities: Clearly defined roles and responsibilities for individuals involved in data management, including researchers, collaborators, and data stewards.
Budget and Resources: Allocation of resources, both financial and human, needed for effective data management throughout the project.
Data Disposal: Plans for the secure disposal or retention of data, taking into account legal and ethical considerations.
Data Management Training: Details about any training that will be provided to researchers to ensure they understand and follow proper data management practices.
Metadata Standards
Our service is knowledgeable about metadata standards for datasets. This helps you organize your data effectively for effective sharing and future use.
Details
Metadata (data documentation) are absolutely necessary for a complete understanding of the research data content and to allow other researchers to find and re-use your data.
Metadata should be as complete as possible, using the standards and conventions of a discipline, and should be machine readable. Metadata should always accompany a dataset, no matter where it is stored.
Practical courses about these aspects are provided by our service on a regular basis.
Tool
For help on documenting your data before depositing it on data repository, have a look at:
Formalized specific metadata standards available for particular file formats and disciplines.
DataCite Metadata Schema useful for describing general research data based on European recommendation.
Standards Files format
To ensure long-term access and reusability of your data, the DSBU encourages you to deposit and share your files using standard preservation and Open file formats most likely to be accessible in the future.
Details
As technology evolves, it is important to consider which file formats you will use for preserving files in the long run.
File formats most likely to be accessible in the future have the following characteristics:
- Non-proprietary
- Open, documented standard
- Popular format
- Standard representation
- Unencrypted
- Uncompressed
We can provide you with guidance on which format to use for long-term preservation and sharing of your data.
Tool
For help on long-term preservation standards format have a look at our DSB Recommended Files format link
FAIR Data Compatibility
Our service guides researchers on making their data FAIR (Findable, Accessible, Interoperable, Reusable) compatible. This involves ensuring that metadata is comprehensive and standardized, using open formats and standards, and enhancing data accessibility.
Details
FAIR data principles
One of the grand challenges of data-intensive science is to facilitate knowledge discovery by assisting humans and machines in their discovery of, access to, integration and analysis of, task-appropriate scientific data and their associated algorithms and workflows. Force11 describes FAIR – a set of guiding principles to make data Findable, Accessible, Interoperable, and Reusable. The term FAIR was launched at a Lorentz workshop in 2014, the resulting FAIR principles were published in 2016 (link).
To be Findable:
F1. (meta)data are assigned a globally unique and eternally persistent identifier.
F2. data are described with rich metadata.
F3. (meta)data are registered or indexed in a searchable resource.
F4. metadata specify the data identifier.
To be Accessible:
A1 (meta)data are retrievable by their identifier using a standardized communications protocol.
A1.1 the protocol is open, free, and universally implementable.
A1.2 the protocol allows for an authentication and authorization procedure, where necessary.
A2 metadata are accessible, even when the data are no longer available.
To be Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles.
I3. (meta)data include qualified references to other (meta)data.
To be Re-usable:
R1. (meta)data have a plurality of accurate and relevant attributes.
R1.1. (meta)data are released with a clear and accessible data usage license.
R1.2. (meta)data are associated with their provenance.
R1.3. (meta)data meet domain-relevant community standards.
FAIR Data Sharing
Research data and metadata are made available in a format that adheres to standards, making them both human and machine-readable, in line with principles of good data governance and management, following FAIR principles (Findable, Accessible, Interoperable, and Reusable). It’s important to note that FAIR does not necessarily imply open accessibility, and sharing can occur in restricted or contractual forms if needed. However, metadata should be made as openly available as possible.
SNSF Explanation of the FAIR Data Principles (PDF) (link)
Wilkinson et al. (2016), The FAIR Guiding Principles for scientific data management and stewardship, Scientific Data 3, doi:10.1038/sdata.2016.18 (link)
Open Research Data Sharing
Our unit actively supports researchers in sharing their data openly on selected FAIR repositories. This helps increase the visibility of your work within the research community. Regarding Open Research Data (ORD), our guidance on preparing and documenting datasets has facilitated the sharing of over 80 FBM-UNIL and CHUV datasets on the Zenodo repository within the FBM community space (link).
Identifying Suitable Repositories for Open Data
Our service assists researchers in finding appropriate FAIR data repositories that align with the requirements of funding agencies and journals. This ensures that research data can be published and accessed according to established policies.
Identifying Suitable catalogues for restricted-Access Data
For research sensitive data that cannot be shared openly, we offer guidance in collaboration with the CHUV IT (DSI) service on making datasets visible by publishing metadata describing the data’s characteristics using the Horus CHUV dataset catalogue. This allows international researchers to understand the dataset that has been published and request access if necessary, following proper legal procedures.
Details
The externally accessible HORUS CHUV dataset catalog is the product of a close collaboration of the DSBU with the CHUV IT Department (DSI-CHUV). Developed and managed by the DSI-CHUV, this catalog focuses on showcasing metadata (documentation) that describes the content of sensitive clinical datasets generated at CHUV, which cannot be shared due to legal restrictions.
Sensitive CHUV datasets are secured with controlled access through the Datasets Catalog Horus. The CHUV catalog-Zenodo transfer ensures interoperability, where FAIR metadata is automatically transmitted and made publicly accessible via the FBM-Zenodo community.
Process Overview:
- Users upload their datasets and complete the required metadata fields (dataset information).
- A datasteward (curator) from the DSBU reviews the metadata to confirm that the dataset has been correctly deposited.
- Once verified, the curator approves the dataset for final submission.
Data Copyright and Licensing
We assist researchers in understanding data copyright, licensing, and self-archiving rules. This ensures that you are aware of your rights and responsibilities when sharing your data.
Data preservation (Long Term Storage)
Our service, in partnership with the Computing and Research Support Division of UNIL (DCSR), has been actively engaged in the implementation of Long-Term Storage (LTS) for FBM-UNIL research data (link). This initiative is of crucial importance in the face of the exponential growth in data storage costs generated by research.
Through the process of data life cycle management, our team is providing information, advice and help to FBM UNIL researchers for long term storage and preservation of their data for free at the DCSR UNIL.
We can provide you with guidance on how to prepare a Readme file and reorganize your data in order to preserve your work.
- Informing and guiding you through the process of reorganizing and describing your research data in the form of explanatory documents called “readme file”.
- Final validation of your readme file before data migration from the DCRS NAS to the LTS platform.
Details
Caracteristics
- Free of charge (with security and backup)
- Time limit for storage duration (at minimum 10 years for published data) with possible extension (1x).
- Restricted access (limited number of accesses to data on LTS and under request only).
- Effective organization of your data under a project form (matched with a funding, a specific theme, a publication etc.).
- Production of a Readme file (document describing the dataset content) for each individual futur TAR subdirectory
- Naming rules for (TAR) subdirectories in the LTS directory
Procedure for Long-term storage of “cold” data
Details
- Contact us via the NAS DCSR dashbord
- Via the interface on the homepage, you will be able to request the LTS tape transfer for part of your data by clicking on the button “Request for long-term storage (LTS)” in the list of actions listed on the interface homepage.Connect and sign in to your NAS DCSR dashboard via your SWITCH UNIL account using this link
- Select «making a long term storage request» via your DCRS dashboard interface to ask for Long Term Storage (LTS)
- To send your request for long term storage of part of your research data on magnetic tape, click on the submit button. Mention which data you would like to transfert for long term storage.
- An e-mail will be send automatically to Cécile Lebrand via the generic email address researchdata@unil.ch. C. lebrand will contact you to make an appointment within 15 days and will provide personalized assistance.
- FBM Data steewards help for organizing your data for long-term storage
Data organization
- You will be informed and guided through the process of reorganizing and describing your research data in the form of explanatory documents called a readme files called readme file.
- You will be will helped to determine the best organizational strategy for your data.
- During this process, you will be able to give a temporary access to your data stored on the DCSR NAS to Cécile Lebrand via the Ci interface on the homepage and using the following username: clebrand.
- You will need to reorganize your data according to the projects you will have defined and clean up/eliminate obsolete data.
- Research data must be organized into the LTS directory around a given project (matched with a funding, a specific theme, a publication etc.).
TAR (Tape Archive) file
- TAR archive file: size of the data volume in each individual subdirectory to be archived is free up to 5 TB per subdirectory in the LTS directory. No need to compress your data since this step will be done during the creation of the TAR subdirectory.
- The TAR files to be archived must remain at the root of the LTS directory.
- Each TAR subdirectory has to comply to naming rules (see below) and should be accompanied by an independent readme file. We suggest to create a general Readme Template that can be shortly adapted to each individual TAR subdirectory if possible.
- The naming rule only applies to the first TAR sub-folders in the LTS directory. The TAR archive files created from these directories will have the same name. Within these directories the names of the files and data directories are free.
- Naming rules for the readme file in each TAR directory: The name and file format of the readme file must not be changed, “LongTermStorage_Data_Description_EN.docx”.
- Naming rules for subdirectories in the LTS directory (TAR archive file) The length of the folder name must not exceed a maximum of 40 charactersaccording to the rulesbelow
- Numbers from 0-9
- Letters a-z
- Letters A-Z
- Hyphen ( – ) OK but not at the beginning or end of the directory name
- Underline ( _ ) OK but not at the beginning or end of the directory name
- No white spaces
- No accented characters or symbols
Readme file
Complete the readme file for each distinct subdirectory to be archived in individual TAR archive file following the established guidelines and send it for final revision to Cécile Lebrand. In the event your readme file are not considered complete enough as to understand the nature of the data set, C. Lebrand will send you an add-in request.
Validation
Once your readme file have been approved by clebrand, they should be included at the root of each individual moved within data subdirectory and the future TAR file should be moved from the D2C (or D1C) directory to the LTS directory.
The DCSR will create the TAR archive files and transfer your data on magnetic tape with all individual readme file included at the root of each individual TAR archive fileAfter completing the magnetic tape transfer process for the TAR directory, the LTS directory will include the following elements:
- The TAR file will be renamed by adding the prefix “ARCHIVED*,” and the underlying information will only be accessible in read-only mode.
- A copy of the readme file “LongTermStorage_Data_Description_EN.docx” will be accessible in read-only mode.
- A file named “INVENTORY_OF_ARCHIVED_FILES.txt” within the TAR directory will list the archived files with their full paths and provide information on the recovery procedure from the tapes.
This procedure ensures that the archived data remains accessible and easily identifiable. The addition of the “ARCHIVED*” prefix allows UNIRIS and the researcher to quickly confirm that archiving has been completed by simply examining the TAR directory names at a given moment.
Readme file template for LTS
1 – UNIL Template
2 – FBM automated reamed file
Our unit is deploying a user-friendly and automated methodology for generating readme files for Big Datasets, adapted from Professor Aleksandar Vještica at CIG. This innovative approach will help you to adeptly convey crucial details about their datasets through these readme documents.
Data recovery of TAR subdirectory from the LTS platform
Details
Connect and sign in to your NAS DCSR dashboard via your SWITCH UNIL account using this link
- Select «Retrieving data from Long term Storage» via your DCRS dashboard interface to ask for recovery of your data on the Long Term Storage (LTS)
- To validate your request to retrieve some or all of your research data stored on magnetic tape (LTS), please click on the “Submit” button below. If you already have a specific request or question, you may, if you wish, leave your comment(s) below.
- An e-mail will be send automatically to The DCSR that will process your demand and you will be able to access and to work with your data on your DCSR-NAS space.
Once the request has been made to the DCSR, the time taken to process the request will depend on the complexity of the data to be processed.