Home Science In-Context Reinforcement Learning via Communicative World Models
Science

In-Context Reinforcement Learning via Communicative World Models

Key Points

arXiv:2508.06659v2 Announce Type: replace Abstract: Reinforcement learning (RL) agents often struggle to generalize to new tasks and contexts without updating their parameters, mainly because their learned representations and policies are overfit to the specifics of their training environments. To boost agents' in-context RL (ICRL) ability, this work formulates ICRL as a two-agent emergent communication problem and introduces CORAL (Communicative Representation for Adaptive RL), a framework...

arXiv:2508.06659v2 Announce Type: replace Abstract: Reinforcement learning (RL) agents often struggle to generalize to new tasks and contexts without updating their parameters, mainly because their learned representations and policies are overfit to the specifics of their training environments. To boost agents' in-context RL (ICRL) ability, this work formulates ICRL as a two-agent emergent communication problem and introduces CORAL (Communicative Representation for Adaptive RL), a framework that learns a transferable communicative context by functionally separating latent representation learning from control. In CORAL, an Information Agent (IA) is pre-trained as a world model on a diverse distribution of tasks. Its objective is not direct return maximization, but world modeling and distilling its understanding into concise messages. The emergent communication protocol is shaped by a novel Causal Influence Loss, which measures the effect that the message has on the next action. During deployment, the previously trained IA serves as a fixed contextualizer for a new Control Agent (CA), which learns to solve tasks by interpreting the provided communicative context. Our experiments demonstrate that this approach enables the CA to achieve significant gains in sample efficiency and successfully perform zero-shot adaptation with the help of pre-trained IA in diverse online and offline environments, validating the efficacy of learning a transferable communicative representation.
World Models arXiv:2508.06659v2 (EVENT) RL (ORG) ICRL (ORG) CORAL (ORG) IA (LOCATION) Control (ORG)
Originally published by arXiv CS Read original →