Inverse Reinforcement Learning
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Learning To Sample From Diffusion Models Via Inverse Reinforcement Learning
Announce Type: replace Abstract: Diffusion models generate samples through an iterative denoising process guided by a pretrained neural network. Once the denoiser is fixed, the sampling algorithm itself (noise schedules, guidance scales, stochasticity profiles) still requires careful tuning, a process typically carried out through costly empirical grid search. In this work, we introduce an inverse reinforcement learning framework for learning sampling strategies without retraining the denoiser.
Inverse Reinforcement Learning without an Optimal Demonstrator: A Feasible Reward Set Approach
arXiv:2605.30903v1 Announce Type: new Abstract: Inverse reinforcement learning (IRL) typically assumes demonstrations from a single optimal demonstrator, but in many applications data come from multiple imperfect demonstrators with heterogeneous suboptimality levels. We study reward learning in this setting through a feasible-reward-set framework: for each demonstrator, we encode its declared suboptimality level as a linear constraint and intersect the resulting feasible sets across...
A Lecture Note on Offline RL and IRL, Part II: Foundations of Inverse Reinforcement Learning and Dynamic Discrete Choice Models
arXiv:2605.30843v1 Announce Type: new Abstract: In the forward reinforcement-learning problem, the reward is fixed and known; the learner is asked to find a good policy or value function. Here we turn the question around. Given offline data generated by an expert, can we recover the reward the expert was optimizing?
Scalable Reinforcement Learning via Adaptive Batch Scaling
arXiv:2605.21557v2 Announce Type: replace-cross Abstract: Conventional wisdom holds that large-batch training is fundamentally incompatible with Reinforcement Learning (RL) - beyond a modest threshold, increasing batch sizes typically yields diminishing returns or performance degradation due to the inherent non-stationarity of the data distribution. We challenge this view by observing that non-stationarity is not a fixed property of RL, but evolves throughout training: early stages exhibit...
ConTraIRL: Factorized Contrastive Abstractions for Transferable IRL
arXiv:2606.03017v1 Announce Type: new Abstract: Reward transfer in Inverse Reinforcement Learning (IRL) is unreliable when policies must generalize to unseen combinations of environment dynamics and task goals. We propose Factorized Contrastive Abstractions for Transferable IRL (ConTraIRL), a framework that enables compositional reward transfer by learning decoupled latent representations of these two factors. ConTraIRL uses a dual-encoder architecture that maps observations into separate...
Towards Automated Discovery: A Review of Generative Models, Multimodal Learning and Closed-Loop Workflows in Inverse Materials Design
arXiv:2606.02507v1 Announce Type: cross Abstract: Inverse materials design is shifting materials discovery from forward prediction to targeted proposal of candidates that satisfy objectives under physical constraints. Here, we review recent advances in generative crystal structure modeling, multimodal learning, and closed-loop design pipelines for crystalline solids. We survey how modern generators learn chemical-structural priors from large databases to enable controllable sampling of...
Towards Automated Discovery: A Review of Generative Models, Multimodal Learning and Closed-Loop Workflows in Inverse Materials Design
arXiv:2606.02507v1 Announce Type: cross Abstract: Inverse materials design is shifting materials discovery from forward prediction to targeted proposal of candidates that satisfy objectives under physical constraints. Here, we review recent advances in generative crystal structure modeling, multimodal learning, and closed-loop design pipelines for crystalline solids. We survey how modern generators learn chemical-structural priors from large databases to enable controllable sampling of...
Coherent Off-Policy Improvement of Large Behavior Models with Learned Rewards
Announce Type: new Abstract: Distilling expert demonstration data into large generative models using behavioral cloning is a scalable approach to learning capable policies for robotic control, particularly for dexterous manipulation. Reinforcement learning (RL) can be used as a means to finetune these policies further using additional experience. An open question is whether RL is more sample-efficient than collecting more human demonstrations.
Inverse Manipulation through Symbolic Planning and Residual Operator Learning
arXiv:2606.05248v1 Announce Type: new Abstract: Inverting a robotic task requires more than reversing symbolic state transitions or rewinding motor trajectories. In robot manipulation tasks, symbolic inverse plans often fail to fully restore the effects of forward executions under continuous interaction dynamics.
Escaping the Verifier: Learning to Reason via Demonstrations
arXiv:2511.21667v4 Announce Type: replace Abstract: Training Large Language Models (LLMs) to reason often relies on Reinforcement Learning (RL) with task-specific verifiers. However, many real-world reasoning-intensive tasks lack verifiers, despite offering abundant expert demonstrations that remain under-utilized for reasoning-focused training. We introduce RARO (Relativistic Adversarial Reasoning Optimization), which learns strong reasoning capabilities from expert demonstrations alone via...