Home Health GLOF: A large-scale expert-curated benchmark dataset of...
Health

GLOF: A large-scale expert-curated benchmark dataset of gain-of-function and loss-of-function missense variants

Key Points

Distinguishing loss-of-function (LOF) from gain-of-function (GOF) effects of missense variants is fundamental to understanding disease mechanisms and guiding therapeutic strategy, yet no large-scale, expert-curated benchmark has been publicly available for this task. Here we present GLOF (Gain and Loss Of Function), a dataset of 112,399 missense variants across 2,809 human genes, each classified as LOF, GOF, or neutral by board-certified clinical geneticists following ACMG guidelines....

Distinguishing loss-of-function (LOF) from gain-of-function (GOF) effects of missense variants is fundamental to understanding disease mechanisms and guiding therapeutic strategy, yet no large-scale, expert-curated benchmark has been publicly available for this task. Here we present GLOF (Gain and Loss Of Function), a dataset of 112,399 missense variants across 2,809 human genes, each classified as LOF, GOF, or neutral by board-certified clinical geneticists following ACMG guidelines. Pathogenic variants were sourced from ClinVar and annotated with their functional mechanism based on published functional studies, phenotype correlations, and established gene-disease relationships. Neutral variants were drawn from gnomAD v3.1 and validated against v4.1 using stringent population frequency filters. The dataset spans diverse protein families, includes 97 genes with bidirectional mechanisms (containing both LOF and GOF variants), and has been validated against well-characterized variants in the literature. GLOF is publicly available on Kaggle (https://www.kaggle.com/datasets/maricatovictor/loss-and-gain-of-function-variants) and Hugging Face (https://huggingface.co/datasets/victormaricato/glof), and provides a standardized resource for developing and benchmarking computational methods that predict variant functional mechanisms.
GLOF (PERSON) GOF (ORG) LOF (PERSON) ACMG (ORG) ClinVar (ORG) Kaggle (PERSON)
Originally published by bioRxiv Read original →