Home Technology Metadata Collector: An Open-Source Platform for...
Technology

Metadata Collector: An Open-Source Platform for Standardized Metadata Management in Multi Centre Sequencing Projects

Key Points

Background: Next-generation sequencing (NGS) projects generate increasingly complex metadata that are critical for reproducibility, interoperability, and compliance with FAIR principles. Nevertheless, metadata curation in multi-institutional settings often still relies on spreadsheets, manual data entry and curation, as well as non-standardized terminology. These practices frequently result in incomplete or inconsistent annotations, hinder metadata sharing, and delay submission to public...

Background: Next-generation sequencing (NGS) projects generate increasingly complex metadata that are critical for reproducibility, interoperability, and compliance with FAIR principles. Nevertheless, metadata curation in multi-institutional settings often still relies on spreadsheets, manual data entry and curation, as well as non-standardized terminology. These practices frequently result in incomplete or inconsistent annotations, hinder metadata sharing, and delay submission to public repositories. Results: We developed Metadata Collector as a React/API/PostgreSQL web platform and deployed it on a Kubernetes cluster within a large German research consortium. The platform implements a flexible, machine-readable metadata model for experimental data and integrates customizable templates, controlled vocabularies designed to support future ontology integration, and a complete event-based versioning model. Since deployment, Metadata Collector has been used across 32 projects involving RNA-seq, scRNA-seq, ATAC-seq and multiomics datasets, representing over 700 annotated samples contributed by multiple consortium partners. The platform is designed for use by non-computational researchers as well as centralized facilities and can be integrated into existing research data management infrastructures. Conclusions: Metadata Collector embeds standardization early in the metadata lifecycle, ensuring consistent, FAIR-aligned, and reproducible metadata across distributed research groups. Its modular, open-source architecture supports both local and consortium-scale deployments and provides a foundation for future extensions, including multi-omics support and integration with laboratory information management systems and automated submission pipelines.
Multi Centre Sequencing Projects Background (ORG) NGS (ORG) FAIR (ORG) Metadata Collector (ORG) Kubernetes (LOCATION) German (ORG) ATAC (ORG)
Originally published by bioRxiv Read original →