Stop Wandering, Find the Keys: LLMs Discriminate Key States for Efficient Multi-Agent Exploration

arXiv CS Tuesday 02 June 2026, 04:00 UTC By Yun Qu, Boyuan Wang, Yuhang Jiang, Jianzhun Shao, Yixiu Mao, Heming Zou, Chang Liu, Cheems Wang, Meiqin Liu, Xiangyang Ji 1 min read

Key Points

Announce Type: replace Abstract: With expansive state-action spaces, efficient multi-agent exploration remains a longstanding challenge in reinforcement learning. Although pursuing novelty, diversity, or uncertainty attracts increasing attention, redundant efforts brought by exploration without proper guidance choices poses a practical issue for the community. This paper introduces a systematic approach, termed LEMAE, choosing to channel informative task-relevant guidance from a...

arXiv:2410.02511v2 Announce Type: replace Abstract: With expansive state-action spaces, efficient multi-agent exploration remains a longstanding challenge in reinforcement learning. Although pursuing novelty, diversity, or uncertainty attracts increasing attention, redundant efforts brought by exploration without proper guidance choices poses a practical issue for the community. This paper introduces a systematic approach, termed LEMAE, choosing to channel informative task-relevant guidance from a knowledgeable Large Language Model (LLM) for Efficient Multi-Agent Exploration. Specifically, we ground linguistic knowledge from LLM into symbolic key states, that are critical for task fulfillment, in a discriminative manner at low LLM inference costs. To unleash the power of key states, we design Subspace-based Hindsight Intrinsic Reward (SHIR) to guide agents toward key states by increasing reward density. Additionally, we build the Key State Memory Tree (KSMT) to track transitions between key states in a specific task for organized exploration. Benefiting from diminishing redundant explorations, LEMAE outperforms existing SOTA approaches on the challenging benchmarks (e.g., SMAC and MPE) by a large margin, achieving a 10x acceleration in certain scenarios.

LEMAE (ORG) Efficient Multi-Agent Exploration (ORG) LLM (ORG) Subspace (ORG) KSMT (ORG) SOTA (ORG) SMAC (ORG) MPE (ORG)

Originally published by arXiv CS Read original →

Stop Wandering, Find the Keys: LLMs Discriminate Key States for Efficient Multi-Agent Exploration

Related Stories

'Voltron: Legendary Defender' turns 10 today, and we think this mecha robot reboot was just as good as 'Power Rangers' and 'Transformers'

Exclusive-GM may ditch LFP batteries for future EVs

Claude Fable won’t answer basic biology questions

Musk Stock Fans Say ‘The More, The Better’ in SpaceX IPO Frenzy