{"id":2345,"date":"2026-06-10T17:06:47","date_gmt":"2026-06-10T15:06:47","guid":{"rendered":"https:\/\/wp.unil.ch\/dsbu\/?page_id=2345"},"modified":"2026-06-25T18:11:10","modified_gmt":"2026-06-25T16:11:10","slug":"biometaxtractdsbu-multimodal-instrument-metadata-extraction-for-fair-data-sharing","status":"publish","type":"page","link":"https:\/\/wp.unil.ch\/dsbu\/biometaxtractdsbu-multimodal-instrument-metadata-extraction-for-fair-data-sharing\/","title":{"rendered":"BioMetaXtract\u00a9DSBU: Multimodal Instrument Metadata Extraction for FAIR Data Sharing"},"content":{"rendered":"\n<h3 class=\"wp-block-heading\">Developed <strong>by St\u00e9phanie Battini<\/strong>, PhD Ph.D. in Medical Sciences from the University of Strasbourg<\/h3>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>From native research files to standards-aligned, repository-ready metadata<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>BioMetaXtract\u00a9DSBU<\/strong> is a tool developed <strong>by St\u00e9phanie Battini <\/strong>at the Data Stewardship Biomed Unit to help researchers extract, organize and standardize metadata directly from native research files.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Many scientific instruments automatically record rich technical information during data acquisition. This information is often stored inside the files themselves or in associated sidecar files, but it is not always easy to access, interpret or reuse. BioMetaXtract makes these internal metadata visible, structured and reusable.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The objective is to reduce the manual burden of dataset documentation and to support researchers in preparing their data for FAIR sharing, repository submission and long-term preservation.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">BioMetaXtract extracts metadata from acquisition files, maps them to relevant community standards, and reorganizes them into repository-ready formats adapted to the data type and target repository.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>What BioMetaXtract does<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">BioMetaXtract supports the documentation workflow from raw research files to FAIR metadata outputs.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">It can<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>read native metadata embedded in research files;<\/li>\n\n\n\n<li>detect key acquisition parameters produced by instruments or acquisition software;<\/li>\n\n\n\n<li>extract information from associated sidecar files when available;<\/li>\n\n\n\n<li>reorganize metadata according to discipline-specific standards;<\/li>\n\n\n\n<li>prepare structured metadata outputs that can support deposition in recognized repositories;<\/li>\n\n\n\n<li>facilitate dataset documentation for FAIR data sharing and Open Research Data requirements.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">In practice, BioMetaXtract helps researchers transform instrument-generated metadata into understandable, standardized and reusable information.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Why this matters<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Research datasets are often difficult to reuse because important contextual information is missing, scattered or stored in technical formats that are hard to interpret.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For example, a microscopy file may contain information about the microscope, objective, detector, laser wavelengths, channels and acquisition settings. A flow cytometry file may contain parameters related to the cytometer configuration and measured channels. A DICOM file may contain information about imaging modality, acquisition date, scanner settings and image structure.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">BioMetaXtract helps capture this information automatically, so that datasets can be better described, checked, shared and preserved.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This supports:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>improved dataset documentation;<\/li>\n\n\n\n<li>better traceability of data acquisition;<\/li>\n\n\n\n<li>easier preparation of repository submissions;<\/li>\n\n\n\n<li>alignment with FAIR principles;<\/li>\n\n\n\n<li>reduced manual metadata entry;<\/li>\n\n\n\n<li>better long-term reuse of research data.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Supported data modalities and standards<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">BioMetaXtract is designed as a multimodal metadata extraction tool. It supports several types of research data commonly produced within biomedical and life science research environments.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Flow cytometry<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">For flow cytometry data, BioMetaXtract extracts metadata from <code>.fcs<\/code> files and aligns them with the <strong>MIFlowCyt<\/strong> standard.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The extracted metadata can support documentation and deposition workflows, for example in <strong>Zenodo<\/strong> or other appropriate repositories depending on the sharing strategy.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Current support includes extraction of key metadata from FCS files and of FlowJo <code>.wsp<\/code> sidecar files.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Microscopy<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">For light microscopy data, BioMetaXtract supports formats such as:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>.czi<\/code><\/li>\n\n\n\n<li><code>.nd2<\/code><\/li>\n\n\n\n<li><code>.oir<\/code><\/li>\n\n\n\n<li class=\"has-normal-font-size\"><code>.<\/code>lsm<\/li>\n\n\n\n<li class=\"has-normal-font-size\"><code>.<\/code>lif<\/li>\n\n\n\n<li class=\"has-normal-font-size\"><code>.<\/code>tif<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">The extracted metadata are mapped to the <strong>REMBI<\/strong> standard used by the <strong>BioImage Archive<\/strong>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">BioMetaXtract extracts and structures metadata related to image acquisition, including instrument information and, when available, detailed acquisition parameters such as:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>microscope information;<\/li>\n\n\n\n<li>objective information;<\/li>\n\n\n\n<li>detector information;<\/li>\n\n\n\n<li>laser wavelengths;<\/li>\n\n\n\n<li>filter information;<\/li>\n\n\n\n<li>channel-level metadata;<\/li>\n\n\n\n<li>pinhole size;<\/li>\n\n\n\n<li>detector gain and offset.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">These metadata are useful for preparing submissions to the <strong>BioImage Archive<\/strong> and for improving the documentation of bioimaging datasets.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Current support includes extraction of key metadata from images files and of metadata sidecar files as well as Imaris files.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Electron microscopy<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">For electron microscopy data, BioMetaXtract supports formats such as:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>.mrc<\/code><\/li>\n\n\n\n<li><code>.tif<\/code><\/li>\n\n\n\n<li><code>.mdoc<\/code> sidecar files<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">The extracted metadata are aligned with <strong>EMPIAR<\/strong> repository guidelines.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This supports documentation workflows for electron microscopy datasets, including tomograms, snapshots, montage data and associated acquisition metadata.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Target repositories may include:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>BioImage Archive<\/strong><\/li>\n\n\n\n<li><strong>EMPIAR<\/strong><\/li>\n\n\n\n<li><strong>EMDB<\/strong><\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">depending on the nature of the dataset and the scientific domain.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Metabolomics<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">For metabolomics data, BioMetaXtract supports formats such as:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>.mzML<\/code><\/li>\n\n\n\n<li>Agilent <code>.d<\/code><\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">The metadata are aligned with <strong>MSI reporting recommendations<\/strong> and <strong>ISA-Tab<\/strong> structures used for metabolomics data deposition.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The tool supports metadata preparation for repositories such as <strong>MetaboLights<\/strong>, and can also support FAIR documentation workflows for datasets deposited in <strong>Zenodo<\/strong>, depending on the publication and sharing requirements.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Clinical DICOM imaging<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">For clinical imaging data, BioMetaXtract supports DICOM files such as:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>.dcm<\/code><\/li>\n\n\n\n<li>CT<\/li>\n\n\n\n<li>MR<\/li>\n\n\n\n<li>PET<\/li>\n\n\n\n<li>NM<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">The extracted metadata are based on native DICOM information and can support mapping toward <strong>BIDS-compatible<\/strong> documentation structures.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Depending on the research context and the sensitivity of the data, outputs may support deposition or metadata sharing through appropriate repositories or catalogues.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For clinical datasets, additional attention must always be given to:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>de-identification;<\/li>\n\n\n\n<li>consent;<\/li>\n\n\n\n<li>access restrictions;<\/li>\n\n\n\n<li>sensitive metadata pathways;<\/li>\n\n\n\n<li>institutional and legal requirements.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>MRI, PET-CT and preclinical imagin<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">For MRI and PET-CT data, including preclinical imaging workflows, BioMetaXtract supports metadata extraction from imaging files and aims to organize them according to <strong><strong>BIDS-compatible<\/strong> documentation structures<\/strong>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For neuroscience-related imaging datasets, <strong>E-BRAINS<\/strong> may be an appropriate target repository. For other disciplines, <strong>Zenodo<\/strong> may be more appropriate, depending on the dataset type, sensitivity and publication requirements.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Current BIDS-related outputs should be considered as structured support for documentation and future repository preparation. Full BIDS compliance may require additional curation depending on the dataset and repository requirements.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>microCT, IVIS and TIFF stacks<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">BioMetaXtract can also support metadata extraction from modalities such as:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>microCT;<\/li>\n\n\n\n<li>IVIS;<\/li>\n\n\n\n<li>TIFF image stacks;<\/li>\n\n\n\n<li>derived imaging outputs.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">For these modalities, no single mature community minimum-information standard may be available. BioMetaXtract therefore applies a best-effort structured metadata approach, capturing available native metadata and organizing them according to FAIR documentation principles.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For neuroscience-related imaging datasets, <strong>E-BRAINS<\/strong> may be an appropriate target repository. For other disciplines, <strong>Zenodo<\/strong> may be more appropriate, depending on the dataset type and publication requirements.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>From extracted metadata to repository-ready outputs<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">BioMetaXtract is not only a metadata extraction tool. Its goal is also to help researchers move toward repository-ready documentation.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The workflow can be summarized as follows:<\/p>\n\n\n\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>Native datasets<\/strong><br>Researchers provide native acquisition files generated by instruments or acquisition software.<\/li>\n\n\n\n<li><strong>Metadata extraction<\/strong><br>BioMetaXtract reads embedded metadata and available sidecar files.<\/li>\n\n\n\n<li><strong>Standards mapping<\/strong><br>The extracted metadata are reorganized according to relevant community standards, such as MIFlowCyt, REMBI, EMPIAR guidelines, MSI \/ ISA-Tab, DICOM or BIDS-inspired structures.<\/li>\n\n\n\n<li><strong>Repository-ready exports<\/strong><br>The output can support dataset documentation and preparation for deposition in appropriate FAIR repositories.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>What researchers gain<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">BioMetaXtract helps researchers save time and improve the quality of dataset documentation.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">It supports:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>automatic extraction of technical metadata;<\/li>\n\n\n\n<li>more complete dataset descriptions;<\/li>\n\n\n\n<li>better alignment with FAIR principles;<\/li>\n\n\n\n<li>easier preparation for repository submission;<\/li>\n\n\n\n<li>improved reproducibility and reuse;<\/li>\n\n\n\n<li>better traceability from acquisition files to published datasets.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">By extracting and structuring metadata early, BioMetaXtract helps ensure that important acquisition information is not lost when data are prepared for sharing or long-term preservation.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">BioMetaXtract is therefore part of an evolving DSBU ecosystem designed to support researchers throughout the FAIR data lifecycle: from acquisition metadata to dataset documentation, repository submission and long-term preservation.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Developed by St\u00e9phanie Battini, PhD Ph.D. in Medical Sciences from the University of Strasbourg From native research files to standards-aligned, repository-ready metadata BioMetaXtract\u00a9DSBU is a tool developed by&hellip;<\/p>\n","protected":false},"author":1002839,"featured_media":2352,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"template-full-width.php","meta":{"_seopress_titles_title":"","_seopress_titles_desc":"","_seopress_robots_index":"","_seopress_robots_follow":"","_seopress_robots_imageindex":"","_seopress_robots_snippet":"","_seopress_robots_primary_cat":"","_seopress_robots_breadcrumbs":"","_seopress_robots_freeze_modified_date":"","_seopress_robots_custom_modified_date":"","_seopress_robots_canonical":"","_seopress_social_fb_title":"","_seopress_social_fb_desc":"","_seopress_social_fb_img":"","_seopress_social_fb_img_attachment_id":0,"_seopress_social_fb_img_width":0,"_seopress_social_fb_img_height":0,"_seopress_social_twitter_title":"","_seopress_social_twitter_desc":"","_seopress_social_twitter_img":"","_seopress_social_twitter_img_attachment_id":0,"_seopress_social_twitter_img_width":0,"_seopress_social_twitter_img_height":0,"_seopress_redirections_value":"","_seopress_redirections_enabled":"","_seopress_redirections_enabled_regex":"","_seopress_redirections_logged_status":"","_seopress_redirections_param":"","_seopress_redirections_type":0,"_seopress_analysis_target_kw":"","_seopress_news_disabled":"","_seopress_video_disabled":"","_seopress_video":[],"_seopress_pro_schemas_manual":[],"_seopress_pro_rich_snippets_disable_all":"","_seopress_pro_rich_snippets_disable":[],"_seopress_pro_schemas":[],"footnotes":""},"class_list":["post-2345","page","type-page","status-publish","has-post-thumbnail"],"_links":{"self":[{"href":"https:\/\/wp.unil.ch\/dsbu\/wp-json\/wp\/v2\/pages\/2345","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wp.unil.ch\/dsbu\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/wp.unil.ch\/dsbu\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/wp.unil.ch\/dsbu\/wp-json\/wp\/v2\/users\/1002839"}],"replies":[{"embeddable":true,"href":"https:\/\/wp.unil.ch\/dsbu\/wp-json\/wp\/v2\/comments?post=2345"}],"version-history":[{"count":5,"href":"https:\/\/wp.unil.ch\/dsbu\/wp-json\/wp\/v2\/pages\/2345\/revisions"}],"predecessor-version":[{"id":2369,"href":"https:\/\/wp.unil.ch\/dsbu\/wp-json\/wp\/v2\/pages\/2345\/revisions\/2369"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wp.unil.ch\/dsbu\/wp-json\/wp\/v2\/media\/2352"}],"wp:attachment":[{"href":"https:\/\/wp.unil.ch\/dsbu\/wp-json\/wp\/v2\/media?parent=2345"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}