Science
Correcting for Global Synonymous Selection Improves the Accuracy of Episodic Positive Selection Inference
Key Points
The ratio of nonsynonymous to synonymous substitution rates ({omega}) constitutes a fundamental parameter for inferring adaptive protein evolution, predicated upon the assumption that synonymous substitutions are selectively inert. This premise, however, is increasingly untenable given evidence of selection acting on synonymous substitutions, driven by various biological processes such as translational efficiency and mRNA stability. In this study, we demonstrate that unmodelled synonymous...
The ratio of nonsynonymous to synonymous substitution rates ({omega}) constitutes a fundamental parameter for inferring adaptive protein evolution, predicated upon the assumption that synonymous substitutions are selectively inert. This premise, however, is increasingly untenable given evidence of selection acting on synonymous substitutions, driven by various biological processes such as translational efficiency and mRNA stability. In this study, we demonstrate that unmodelled synonymous selection introduces substantial bias into {omega} estimation, resulting in elevated false positive rates in tests for positive selection. To rectify this, we present BUSTED+S+MSS, a statistical framework incorporating Multiclass Synonymous Substitution (MSS) models into BUSTED, a method for detecting episodic selection. By partitioning synonymous codons into empirically derived rate classes, this approach accounts for global synonymous constraints. Application to five diverse clades - Drosophila, Caenorhabditis, Enterobacteria, Saccharomyces, and Primates - reveals that the inclusion of MSS components consistently improves model fit and reduces the proportion of genes inferred to be under positive selection. In Enterobacteria, genes retaining significance under the corrected model exhibit weaker constraint on synonymous substitutions (dSs), consistent with the hypothesis that unmodelled purifying selection drives spurious signals of adaptation. Furthermore, an information-theoretic analysis indicates that whilst site-specific variation (SRV) provides the primary correction, global synonymous rate variation (MSS) contributes a distinct second-order correction. In highly divergent alignments, these signals act in concert to improve model fit. The BUSTED+S+MSS framework, especially when coupled with an "error-sink" to absorb alignment artifacts, thus offers a computationally feasible means to disentangle adaptive nonsynonymous substitution from the confounding effects of synonymous constraint.