Home Science Overestimating zero-shot fitness prediction: Broad...
Science

Overestimating zero-shot fitness prediction: Broad benchmarks mask local failures and practical limitations

Key Points

Deep learning models have emerged as promising tools for navigating mutational landscapes in protein engineering. These models can be used to predict mutation fitness without the need for task-specific training, a process known as zero-shot prediction. However, their practical utility remains only partially characterized.

Deep learning models have emerged as promising tools for navigating mutational landscapes in protein engineering. These models can be used to predict mutation fitness without the need for task-specific training, a process known as zero-shot prediction. However, their practical utility remains only partially characterized. Here, we evaluate the zero-shot performance of a panel of protein sequence and structure models across a range of benchmarking conditions, focusing on factors that complicate the interpretation of aggregate metrics. We show that input modality (sequence vs. structure) does not dictate performance on phenotypic tasks. Instead, performance is sensitive to experimental variability and is heavily confounded by correlation between phenotype and protein abundance. While available models may act as coarse filters separating fit mutations from deleterious ones, they cannot meaningfully rank a set of fit mutations or prioritize new-to-nature functions. Ultimately, the practical utility of zero-shot prediction from protein models is narrower than aggregate benchmarks imply.
Deep (PERSON)
Originally published by bioRxiv Read original →