Generative Augmented Inference

arXiv CS Thursday 04 June 2026, 04:00 UTC By Cheng Lu, Mengxin Wang, Dennis J. Zhang, Heng Zhang 1 min read

Key Points

arXiv:2604.14575v2 Announce Type: replace Abstract: Large language models enable inexpensive AI-generated annotations, but using them reliably for causal inference remains challenging. Naively pooling AI and human data induces bias, while existing methods such as Prediction-Powered Inference (PPI; Angelopoulos et al., 2023a) treat AI outputs as proxies of true labels -- an assumption often violated for generative model outputs in practice. We propose Generative Augmented Inference (GAI), a framework that treats AI outputs as general, potentially high-dimensional informative features for learning human labels rather than as surrogates. GAI flexibly models this relationship using nonparametric methods, enabling consistent estimation and valid inference from combined human and AI data. We establish asymptotic normality and show that, under random labeling, GAI strictly improves asymptotic efficiency over human-data-only estimation whenever AI outputs are informative for true labels. Empirical studies on real-world datasets demonstrate that GAI significantly reduces estimation error and improves confidence interval quality across diverse generative data sources relative to human-only and PPI-based estimation.

Generative Augmented Inference arXiv:2604.14575v2 (ORG) AI (ORG) Prediction-Powered Inference (ORG) Angelopoulos et al. (PERSON) Generative Augmented Inference (ORG)

Originally published by arXiv CS Read original →

Generative Augmented Inference

Related Stories

When 'Island Nemo' went missing, locals suspected foul play

Artificial turf contains 400 chemicals tied to cancer and hormone disruption. But is it unsafe?

Japan’s Retail Investor Army Flocks to SpaceX After IPO Drought

NASA addresses criticism over all-male crew selected for Artemis III test mission