Home Knowledge Base Coherent Off-Policy Improvement of Large Behavior Models

Coherent Off-Policy Improvement of Large Behavior Models

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Coherent Off-Policy Improvement of Large Behavior Models with Learned Rewards

Announce Type: new Abstract: Distilling expert demonstration data into large generative models using behavioral cloning is a scalable approach to learning capable policies for robotic control, particularly for dexterous manipulation. Reinforcement learning (RL) can be used as a means to finetune these policies further using additional experience. An open question is whether RL is more sample-efficient than collecting more human demonstrations.

arXiv CS 8d ago