Home Sport Skill Reuse as Compression in Agentic RL
Sport

Skill Reuse as Compression in Agentic RL

Key Points

arXiv:2605.31509v1 Announce Type: new Abstract: Large language model agents trained with reinforcement learning (RL) often learn brittle, task-specific shortcuts. We hypothesize that agents generalize better when their successful trajectories are structurally compressible, decomposed into a small set of reusable abstract patterns. To formalize this, we introduce ReuseRL, which grounds agentic RL in the Minimum Description Length (MDL) principle.

arXiv:2605.31509v1 Announce Type: new Abstract: Large language model agents trained with reinforcement learning (RL) often learn brittle, task-specific shortcuts. We hypothesize that agents generalize better when their successful trajectories are structurally compressible, decomposed into a small set of reusable abstract patterns. To formalize this, we introduce ReuseRL, which grounds agentic RL in the Minimum Description Length (MDL) principle. ReuseRL extracts a shared skill dictionary from successful trajectories and augments the RL objective with a segmentation cost, explicitly penalizing idiosyncratic behaviors that encode poorly. We prove a PAC-Bayes generalization bound for this compression penalty. Across ALFWorld, TextWorld-Cooking, and Countdown-Stepwise, ReuseRL improves in- and out-of-distribution success over vanilla GRPO and strong round-length baselines.
Skill Reuse (PERSON) Agentic RL (PERSON) RL (ORG) MDL (ORG) PAC-Bayes (ORG) ALFWorld (ORG) TextWorld-Cooking (ORG) Countdown-Stepwise (ORG) GRPO (ORG)
Originally published by arXiv CS Read original →