Science
Do AI Structure Predictors Capture Bound-State Disorder? A Benchmark on Fuzzy Protein Complexes
Key Points
Fuzzy protein complexes, in which an intrinsically disordered protein (IDP) retains conformational disorder upon binding, pose a fundamental challenge for structure predictors trained on ordered systems, where crystal structures capture only the most ordered ensemble snapshot, making standard benchmarking metrics misleading. Here, we present the first systematic evaluation of AlphaFold3 (AF3), AlphaFold2-Multimer (AF2MM), Chai-1, and Boltz-2 on a curated dataset of fuzzy complexes from...
Fuzzy protein complexes, in which an intrinsically disordered protein (IDP) retains conformational disorder upon binding, pose a fundamental challenge for structure predictors trained on ordered systems, where crystal structures capture only the most ordered ensemble snapshot, making standard benchmarking metrics misleading. Here, we present the first systematic evaluation of AlphaFold3 (AF3), AlphaFold2-Multimer (AF2MM), Chai-1, and Boltz-2 on a curated dataset of fuzzy complexes from FuzDB, benchmarked against DockQ against PDB structures and NOE violation rates against manually curated BMRB restraint files, the first comprehensive collection of this kind. Across all four predictors, approximately 30% of NOE restraints were violated with nearly identical distributions regardless of predictor architecture or training data. DockQ scores fell uniformly in the Acceptable range, with AF3 marginally higher but showing equivalent NOE violation rates to the weakest-performing model. Ensemble-level analysis using a first-principles implementation of the Hadzi thermodynamic model revealed that AF3 uniquely achieves near-zero mean helicity bias versus systematic overconfidence in the other predictors, yet all four models show poor per-residue helicity correlation with thermodynamic expectations. DockQ rankings reflect training data similarity to crystal structures rather than physical accuracy, and no current predictor captures fuzzy complex ensemble behavior. The FuzzyBench-NOE dataset, comprising NOE restraint files, predicted structures, interface hotspot annotations, and Hadzi--DSSP analysis outputs, is released on Zenodo (https://doi.org/10.5281/zenodo.20470556).