Readme file tools for Long Term Storage (LTS)

1 – UNIL Template

2 – DataSquid

We have developed an advanced documentation solution, Datasquid, designed to enhance reproducibility in research by automating the process of data documentation for each experiment. The tool seamlessly integrates with local or institutional equipment database and internal laboratory protocol to generate detailed README files for individual experiments. By capturing all relevant metadata and procedural information, DataSquid ensures that each experiment is thoroughly documented, promoting data preservation.

To ensure proper implementation and optimal use of DataSquid, PIs must first contact our team. We will assist in setting up and customizing the platform to meet the specific needs of your laboratory. Additionally, we offer personalized training sessions for lab members, ensuring that everyone is well-prepared to utilize DataSquid effectively and efficiently in their research workflows.

Key Functionalities:

Automatic Documentation Generation:

global README file is produced for TARs documentation before their archiving to tapes for the LTS, adhering to the prerequisites set by the Rectorate (UNIRIS) and DCSR. For more details, see archiving prerequisites.

Characteristics of a readme file

A README file is a crucial component of dataset documentation, providing essential information to ensure that the data can be accurately interpreted and utilized by others, as well as by yourself in the future. It serves to enhance the usability, reproducibility, and transparency of your dataset.
Key Elements to Include in a Data README File:
General Information:
Dataset Title: Provide a clear and descriptive title for your dataset.
Author Information: List the names, affiliations, and contact details of the principal investigator and any co-investigators.
Date of Data Collection: Specify the dates when the data was collected.
Geographical Location: If applicable, mention where the data was collected.
Keywords: Include relevant keywords to describe the data’s subject matter.
Data and File Overview:
File Descriptions: Provide a brief description of each file, including its format and purpose.
File Structure: Explain the organization of the files and any relationships between them.
File Naming Conventions: Describe the naming conventions used for files and directories.
Methodological Information:
Data Collection Methods: Detail the procedures and instruments used to collect the data.
Data Processing: Explain any processing or transformation applied to the data.
Quality Assurance: Describe steps taken to ensure data quality and integrity.
Data-Specific Information:
Variable Definitions: Define all variables, including units of measurement and possible codes.
Missing Data: Specify how missing data is represented in the dataset.
Data Formats: Indicate any specialized formats or abbreviations used.
Sharing and Access Information:
Licenses or Restrictions: State any licenses or
restrictions associated with the data.
Related Publications: Provide references to publications that use or are related to the data.
Citation Information: Offer a recommended citation for the dataset.
Best Practices:
File Format: Write the README as a plain text file (e.g., README.txt) to ensure accessibility and longevity.
Standardized Formatting: Use consistent formatting and terminology throughout the README.
Clarity and Detail: Provide sufficient detail to allow others to understand and use the data without additional assistance.
For comprehensive guidance and templates, consider consulting DSBU website.

Proactive and Retrospective Readme file Creation:

DataSquid is user-friendly, enabling:

  • Proactive documentation: README files and metadata automatically created for new acquisitions or analyses ar ecombined to create the readme file for LTS documentation.
  • Retrospective documentation: If experiments were conducted before information was added to DataSquid, the tool generates a simplified README file to document the past experiments efficiently.

With its intuitive design and automation features, DataSquid significantly reduces administrative tasks, ensures adherence to data standards, and facilitates compliance with FAIR principles for data preservation.