Industrializing Prediction-Powered Inference: The GLIDE Library for Reliable GenAI and Agentic Systems Evaluation

arXiv CS Monday 01 June 2026, 04:00 UTC By Gr\'egoire Martinon, Ibrahim Merad, Mohammed Raki 1 min read

Key Points

arXiv:2605.31278v1 Announce Type: new Abstract: Reliable evaluation of agentic systems requires unbiased estimates with valid uncertainty, but standard practice navigates between costly human annotation and biased LLM-as-judge proxies. Prediction-powered inference (PPI) combines both into debiased estimates with valid confidence intervals, yet its various methods remain scattered across papers under partial implementations. We introduce GLIDE, an open-source Python library that unifies state-of-the-art PPI estimators (PPI++, Stratified PPI, Predict-Then-Debias and its stratified variants, Active Statistical Inference) and samplers (uniform, stratified, active, cost-optimal) under a scipy-style API specialized to mean estimation. GLIDE ships with a reproducible Monte Carlo validation suite, an empirically grounded decision tree for method selection, and an agentic evaluation case study showing substantial annotation savings at equivalent precision. The GLIDE package is available at this URL: https://github.com/EmertonData/glide

Agentic Systems Evaluation (ORG) LLM (ORG) Python (ORG) Active Statistical Inference (ORG) API (ORG)

Originally published by arXiv CS Read original →

Popular UK seaside town hotel plunges into administration as holidaymakers updated This popular hotel has entered administration after closing for refurbishment in 2022 A long-shuttered seaside hotel in south Devon, which had been expected to welcome guests again following a major refurbishment, has reportedly gone into administration. According to a notice published by The Gazette, the UK's official public record, administrators were appointed on June 5.

Daily Mirror 25m ago

Scientists were excited about a blood test for many cancers — but it failed a big trial. Here's what to know.

Scientists were excited about a blood test for many cancers — but it failed a big trial. Emerging tests promise to screen for many cancers at once, but one just failed in a big trial. Will these diagnostics deliver on their promise someday?

Live Science 41m ago

After NSIL’s PPP bid, IN-SPACe opens LVM-3 to private sector with ToT push

In a renewed push to hand over Isro’s LVM-3 launch vehicle to private industry, space regulator-cum-promoter Indian National Space Promotion and Authorisation Centre (IN-SPACe) has invited expressions of interest (EoI) for the transfer of technology (ToT) of the country’s heaviest operational rocket. The move comes more than two years after Space PSU NewSpace India Limited (NSIL) had sought private partners to scale up production of the launch vehicle through a public-private partnership...

Times of India 46m ago

NASA chief defends all-male Artemis 3 astronaut crew amid backlash: 'I don't think anyone should be reading into this'

NASA chief defends all-male Artemis 3 astronaut crew amid backlash: 'I don't think anyone should be reading into this' "Our last astronaut candidate class was greater than 50% female. We'll assemble the best astronauts to undertake and complete the objectives." The four astronauts comprising the Artemis 3 crew announced this week are all male, but NASA officials emphasized they were selected based on qualifications and not to exclude any genders.

Space.com 46m ago

Industrializing Prediction-Powered Inference: The GLIDE Library for Reliable GenAI and Agentic Systems Evaluation

Related Stories

Popular UK seaside town hotel plunges into administration as holidaymakers updated

Scientists were excited about a blood test for many cancers — but it failed a big trial. Here's what to know.

After NSIL’s PPP bid, IN-SPACe opens LVM-3 to private sector with ToT push

NASA chief defends all-male Artemis 3 astronaut crew amid backlash: 'I don't think anyone should be reading into this'