Health
Replicate-anchored calibration of within-host single nucleotide variant detection in Mycobacterium tuberculosis whole genome sequencing
Key Points
Intra-host genetic heterogeneity in Mycobacterium tuberculosis is biologically and clinically informative, but its detection from short read whole genome sequencing depends on thresholds over read depth (DP), alternate allele support (AD), and minor allele frequency (MAF) that are rarely empirically anchored. We developed a biological replicate-anchored, lexicographic calibration framework for per-specimen intra-host single nucleotide variant (iSNV) detection. Within-patient replicate sputum...
Abstract Background. Intra-host genetic heterogeneity in Mycobacterium tuberculosis is biologically and clinically informative, but its detection from short read whole genome sequencing depends on thresholds over read depth (DP), alternate allele support (AD), and minor allele frequency (MAF) that are rarely empirically anchored. Methods. We developed a biological replicate-anchored, lexicographic calibration framework for per-specimen intra-host single nucleotide variant (iSNV) detection. Within-patient replicate sputum pairs from a pre-treatment tuberculosis (TB) cohort were scored across the joint (DP, AD, MAF) grid on six concordance metrics; the selector penalized no-signal regimes and ranked cells by reproducibility, with sensitivity as a tiebreaker. Selection stability was quantified by B=1,000 nonparametric bootstrap resamples of replicate pairs, defining a Looser/Primary/Tighter sensitivity ladder. The calibrated rule was applied to 97 patients contributing 282 cultured sputum specimens. Results. Calibration on 169 replicate pairs from 67 patients identified a Primary cell at DP[≥] 60x, AD [≥] 3, MAF [isin] [0.02, 0.50]. The bootstrap modal cell coincided with the Primary cell in 45.7% of 703 successful replications (MAF = 0.02 in 100%; AD = 3 in 90.2%). Per-patient prevalence of detectable within-host diversity at the Primary tier was 16.5% (16/97, Wilson 95% CI: 10.4% - 25.1%); Looser and Tighter tiers yielded comparable rates. Conclusion. Within-patient replicate concordance provides a reproducible empirical anchor for iSNV detection thresholds in TB whole genome sequencing. The framework is internally calibrated and reported with explicit sensitivity tiers. The calibrated rule can be applied to external cohorts as a direct test of transferability.