Technology
Metadata Collector: An Open-Source Platform for Standardized Metadata Management in Multi Centre Sequencing Projects
Key Points
Background: Next-generation sequencing (NGS) projects generate increasingly complex metadata that are critical for reproducibility, interoperability, and compliance with FAIR principles. Nevertheless, metadata curation in multi-institutional settings often still relies on spreadsheets, manual data entry and curation, as well as non-standardized terminology. These practices frequently result in incomplete or inconsistent annotations, hinder metadata sharing, and delay submission to public...
Background: Next-generation sequencing (NGS) projects generate increasingly complex metadata that are critical for reproducibility, interoperability, and compliance with FAIR principles. Nevertheless, metadata curation in multi-institutional settings often still relies on spreadsheets, manual data entry and curation, as well as non-standardized terminology. These practices frequently result in incomplete or inconsistent annotations, hinder metadata sharing, and delay submission to public repositories. Results: We developed Metadata Collector as a React/API/PostgreSQL web platform and deployed it on a Kubernetes cluster within a large German research consortium. The platform implements a flexible, machine-readable metadata model for experimental data and integrates customizable templates, controlled vocabularies designed to support future ontology integration, and a complete event-based versioning model. Since deployment, Metadata Collector has been used across 32 projects involving RNA-seq, scRNA-seq, ATAC-seq and multiomics datasets, representing over 700 annotated samples contributed by multiple consortium partners. The platform is designed for use by non-computational researchers as well as centralized facilities and can be integrated into existing research data management infrastructures. Conclusions: Metadata Collector embeds standardization early in the metadata lifecycle, ensuring consistent, FAIR-aligned, and reproducible metadata across distributed research groups. Its modular, open-source architecture supports both local and consortium-scale deployments and provides a foundation for future extensions, including multi-omics support and integration with laboratory information management systems and automated submission pipelines.