Formalizing Learning from Language Feedback with Provable Guarantees

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Wanqiao Xu, Allen Nie, Ruijie Zheng, Aditya Modi, Adith Swaminathan, Ching-An Cheng 1 min read

Key Points

Announce Type: replace Abstract: Interactively learning from observation and language feedback is an increasingly studied area driven by the emergence of large language model (LLM) agents. Despite impressive empirical demonstrations, so far a principled framing of these decision problems remains lacking. We formalize the Learning from Language Feedback (LLF) problem, assert sufficient assumptions to enable learning despite latent rewards, and introduce $\textit{transfer eluder dimension}$ as...

arXiv:2506.10341v2 Announce Type: replace Abstract: Interactively learning from observation and language feedback is an increasingly studied area driven by the emergence of large language model (LLM) agents. Despite impressive empirical demonstrations, so far a principled framing of these decision problems remains lacking. We formalize the Learning from Language Feedback (LLF) problem, assert sufficient assumptions to enable learning despite latent rewards, and introduce $\textit{transfer eluder dimension}$ as a measure to characterize the hardness of LLF. We formalize the intuition that information in the language feedback governs the learning complexity, and demonstrate cases where learning from rich language feedback can be exponentially faster than learning from reward. We develop a no-regret algorithm, called $\texttt{HELiX}$, that provably solves LLF problems through sequential interactions, with performance guarantees that scale with the transfer eluder dimension. Across several empirical domains, we show that $\texttt{HELiX}$ performs well even when repeatedly prompting LLMs does not work reliably. Our contributions mark an important step towards designing principled interactive learning algorithms using generic language feedback.

Language Feedback (ORG) LLM (ORG) LLF (ORG)

Originally published by arXiv CS Read original →

Formalizing Learning from Language Feedback with Provable Guarantees

Related Stories

'Partners and friends’: Trade and defence top of agenda at EU-South Korea summit

Trump signs $70 billion immigration funding bill after months of delay

Pay what you wish: the restaurant where customers can eat for free – if their conscience lets them

What to know about the report.