Sample-Efficient Robotic Reinforcement Learning
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Population-Free Pareto Tracking for Sample-Efficient Multi-Policy MORL
Announce Type: replace Abstract: Multi-objective reinforcement learning (MORL) is a fundamental framework for real-world decision-making problems involving multiple conflicting criteria. Existing multi-policy (MP) methods typically rely on online evolutionary frameworks that maintain large policy populations, leading to high sample complexity and excessive agent-environment interactions. To mitigate these limitations, we present Multi-policy Pareto Front Tracking (MPFT), a framework without...
Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning
arXiv:2606.08513v1 Announce Type: new Abstract: Autonomous Underwater Vehicles (AUVs) traditionally rely on complex, heavily engineered pipelines for perception, path planning, and motion control. This paper explores the feasibility of an end-to-end Deep Reinforcement Learning (DRL) approach that maps raw sensor data directly to thruster commands, reducing manual engineering. We propose a hierarchical reinforcement learning (HRL) architecture splitting the problem into two Markov Decision...
Human-Like Neural Nets by Catapulting
Human-like Neural Nets by Catapulting Speculative proposal to create artificial neural nets with human-like performance by high-learning-rate/regularization training of overparameterized NNs to trigger catapulting/grokking. Over-parameterization as a route to true generalization would resolve many outstanding mysteries of artificial versus natural intelligence. There are many mysteries about deep learning and human intelligence, but we could describe the biggest anomaly this way: why are...
Generalization of World Models under Environmental Variability for Vision-based Quadrotor Navigation
arXiv:2606.05015v1 Announce Type: new Abstract: World models, learned generative models that predict how an environment evolves, have become a promising tool for sample-efficient robot learning. Yet how robust they are to environmental variability remains poorly understood.
Coherent Off-Policy Improvement of Large Behavior Models with Learned Rewards
Announce Type: new Abstract: Distilling expert demonstration data into large generative models using behavioral cloning is a scalable approach to learning capable policies for robotic control, particularly for dexterous manipulation. Reinforcement learning (RL) can be used as a means to finetune these policies further using additional experience. An open question is whether RL is more sample-efficient than collecting more human demonstrations.