Science
Exploring diverse routes to high-affinity-antibody variable domains through deep-sequencing-informed machine learning
Key Points
The integration of in vitro selection, deep sequencing, and machine learning (ML) has recently been developed as a powerful strategy for discovering functional antibodies. However, how training data composition and ML search space design influence the identification of high-affinity variants remains unclear. Here, we aimed to optimize ML-integrated directed evolution for functional antibody discovery by selecting training data from deep sequencing analysis.
The integration of in vitro selection, deep sequencing, and machine learning (ML) has recently been developed as a powerful strategy for discovering functional antibodies. However, how training data composition and ML search space design influence the identification of high-affinity variants remains unclear. Here, we aimed to optimize ML-integrated directed evolution for functional antibody discovery by selecting training data from deep sequencing analysis. By performing phage display selection using camelid heavy-chain antibodies (VHHs), we demonstrated that early-round data, retaining more binding-negative variants, can be superior for training models to identify high-performance VHHs. We also investigated a lead-independent ML search space design by focusing on conserved residues in final rounds, successfully identifying variants with higher affinities than those from lead-based maturation (KD = 7.9 nM). These findings demonstrate that training data selection and search space design are critical for successful ML-guided antibody engineering and provide diverse pathways for discovering high-affinity VHH variants.