Home Science General Synthetic-Powered Inference
Science

General Synthetic-Powered Inference

Key Points

arXiv:2509.20345v3 Announce Type: replace-cross Abstract: The rapid proliferation of high-quality synthetic data -- generated by advanced AI models or collected as auxiliary data from related tasks -- presents both opportunities and challenges for statistical inference. This paper introduces a GEneral Synthetic-Powered Inference (GESPI) framework that wraps around a broad class of statistical inference procedures to safely enhance sample efficiency by combining synthetic and real data. Our...

arXiv:2509.20345v3 Announce Type: replace-cross Abstract: The rapid proliferation of high-quality synthetic data -- generated by advanced AI models or collected as auxiliary data from related tasks -- presents both opportunities and challenges for statistical inference. This paper introduces a GEneral Synthetic-Powered Inference (GESPI) framework that wraps around a broad class of statistical inference procedures to safely enhance sample efficiency by combining synthetic and real data. Our framework leverages high-quality synthetic data to boost statistical power, yet adaptively defaults to the standard method using only real data when synthetic data are of low quality. The error rate of our method remains below a user-specified bound without any distributional assumptions on the synthetic data, and decreases as the quality of the synthetic data improves. This flexibility enables seamless integration with conformal prediction, risk control, hypothesis testing, and multiple testing procedures, all without modifying the base inference method. We demonstrate the benefits of our method on challenging tasks with limited labeled data, including AlphaFold protein structure prediction, and comparing large reasoning models on complex math problems.
General Synthetic-Powered Inference arXiv:2509.20345v3 Announce Type (ORG) AI (ORG) GEneral Synthetic-Powered Inference (ORG)
Originally published by arXiv CS Read original →