DICE: Entropy-Regularized Equilibrium Selection for Stable Multi-Agent LLM Coordination

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Yi Xie, Zhanke Zhou, Chentao Cao, Bo Liu, Bo Han 1 min read

Key Points

Announce Type: new Abstract: Multi-agent large language model (LLM) systems often fail to reliably outperform a single strong model equipped with best-of-N sampling. We argue that a core source of this instability is ill-posed equilibrium selection: current systems specify what information agents share, but not which coordination convention should be selected. We formalize a broad class of such systems as discounted incomplete-information Markov games and show that two common pathologies,...

arXiv:2606.08068v1 Announce Type: new Abstract: Multi-agent large language model (LLM) systems often fail to reliably outperform a single strong model equipped with best-of-N sampling. We argue that a core source of this instability is ill-posed equilibrium selection: current systems specify what information agents share, but not which coordination convention should be selected. We formalize a broad class of such systems as discounted incomplete-information Markov games and show that two common pathologies, oscillation between competing conventions and drift across them, can both induce unstable learning and linear Bayesian regret. To obtain a well-posed target, we introduce the Heterogeneous Quantal Response Equilibrium (HQRE), an entropy-regularized equilibrium concept with agent- and state-dependent temperatures. Under a monotonicity condition, HQRE is unique, admits linearly convergent mirror updates, and yields bounded Bayesian regret; the same condition yields rollout-measurable stability diagnostics. We instantiate this objective in two algorithms: DICE-PC, which coordinates frozen models through prompt-control actions, and DICE-FT, which performs parameter-efficient mirror fine-tuning. Across eleven benchmarks in four domains, DICE improves accuracy-cost trade-offs over strong within-class baselines; on reasoning and planning tasks, DICE-PC improves by 4.3 percentage points on average and DICE-FT by 8.5 points.

LLM (ORG) Markov (ORG) Bayesian (ORG) the Heterogeneous Quantal Response Equilibrium (ORG) HQRE (ORG) DICE-FT (ORG)

Originally published by arXiv CS Read original →

DICE: Entropy-Regularized Equilibrium Selection for Stable Multi-Agent LLM Coordination

Related Stories

'It's absolutely unbelievable!' | Could weather delay England's final WC warm-up?

From The Sports Desk: Carolina claims Game 4 in back-and-forth Stanley Cup Final

Record winter temperatures in Antarctic raise fears over speed of climate breakdown

AI Data Firm DDN Eyeing a Fresh Funding Round by End of Year