Home › Knowledge Base › Learning

Learning

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

SVL: Goal-Conditioned Reinforcement Learning as Survival Learning

arXiv:2604.17551v2 Announce Type: replace Abstract: Standard approaches to goal-conditioned reinforcement learning (GCRL) that rely on temporal-difference learning can be unstable and sample-inefficient due to bootstrapping. While recent work has explored contrastive and supervised formulations to improve stability, we present a probabilistic alternative, called survival value learning (SVL), that reframes GCRL as a survival learning problem by modeling the time-to-goal from each state as a...

arXiv CS 9d ago

Learning to Perceive the World Through Control: Empowerment-Based Representation Learning

arXiv:2605.30656v1 Announce Type: new Abstract: In many practical reinforcement learning environments, observations are far higher-dimensional than the variables that matter for control. In this work, we ask: can we learn representations that capture only control-relevant features of the environment? We study this question through the empowerment objective, which maximizes an agent's influence over the environment and is widely used for unsupervised skill learning.

arXiv CS 9d ago

Activation-Based Active Learning for In-Context Learning: Challenges and Insights

Announce Type: new Abstract: Deep active learning has previously been explored for LLM in-context sample selection, but not with methods that utilise recent advances in understanding of transformer activations. In this paper, we test the hypothesis that model activations could provide a fine-grained signal to optimise the selection of in-context examples. We present the most comprehensive analysis to date of MLP activation-based deep active learning methods applied to in-context learning,...

arXiv CS 6d ago

Interaction-Breaking Adversarial Learning Framework for Robust Multi-Agent Reinforcement Learning

arXiv:2605.18024v2 Announce Type: replace Abstract: Cooperation is central to multi-agent reinforcement learning (MARL), yet learned coordination can be fragile when external perturbations disrupt inter-agent interactions. Prior robust MARL methods have primarily considered value-oriented attacks, leaving a gap in robustness when interaction structures themselves are corrupted. In this paper, we propose an interaction-breaking adversarial learning (IBAL) framework that takes an...

arXiv CS 9d ago

Learning While Acting: A Skill-Enhanced Test-Time Co-Evolution Framework for Online Lifelong Learning Agents

new Abstract: Lifelong learning is essential for Large Language Model (LLM) agents operating in dynamic, interactive environments. However, existing lifelong learning agents for long-horizon tasks typically depend on discrete skill or past experiences retrieval with static parameters during inference, which prevents them from continuously internalizing test-time feedback like human learners. To bridge this gap, we propose Skill-enhanced Test-Time Co-Evolution (\texttt{LifeSkill}), a...

arXiv CS 6d ago

Learning While Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies

Announce Type: replace Abstract: Generalist robot policies increasingly benefit from large-scale pretraining, but offline data alone is insufficient for robust real-world deployment. Deployed robots encounter distribution shifts, long-tail failures, task variations, and human correction opportunities that fixed demonstration datasets cannot fully capture. We present Learning While Deploying (LWD), a fleet-scale offline-to-online reinforcement learning framework for continual post-training of...

arXiv CS 6d ago

Learning When Not to Act: Mitigating Tool Abuse in Agentic Reinforcement Learning

arXiv:2606.02132v1 Announce Type: new Abstract: Agentic reinforcement learning can induce tool abuse, where models overuse external tools even for queries solvable by internal reasoning. Existing approaches mitigate this issue with uniform tool-use penalties or hard limits, which reduce tool frequency but may also suppress useful tool-assisted exploration. We propose EAPO, an Efficient Agentic Policy Optimization framework that learns selective tool use.

arXiv CS 8d ago

Learning When Not to Act: Mitigating Tool Abuse in Agentic Reinforcement Learning

arXiv:2606.02132v2 Announce Type: replace Abstract: Agentic reinforcement learning can induce tool abuse, where models overuse external tools even for queries solvable by internal reasoning. Existing approaches mitigate this issue with uniform tool-use penalties or hard limits, which reduce tool frequency but may also suppress useful tool-assisted exploration. We propose EAPO, an Efficient Agentic Policy Optimization framework that learns selective tool use.

arXiv CS 7d ago

Learning without training: The implicit dynamics of in-context learning

Announce Type: replace Abstract: One of the most striking features of Large Language Models (LLMs) is their ability to learn in-context. Namely at inference time an LLM is able to learn new patterns without any additional weight update when these patterns are presented in the form of examples in the prompt, even if these patterns were not seen during training. The mechanisms through which this can happen are still largely unknown.

arXiv CS 7d ago

Representation Learning Enables Scalable Multitask Deep Reinforcement Learning

Announce Type: new Abstract: Scaling reinforcement learning (RL) to diverse multitask settings remains a central challenge. While recent advances in model-based RL achieve strong performance, they rely on planning and complex training pipelines, making it unclear which components are essential for scalability. We revisit this question and argue that the primary driver of scalable multitask RL is not model-based control, but \emph{representation learning}.

arXiv CS 5d ago