RT-PG
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Reusing Trajectories in Policy Gradients Enables Fast Convergence
Announce Type: replace Abstract: Policy gradient (PG) methods are a class of effective reinforcement learning algorithms, particularly when dealing with continuous control problems. They rely on fresh on-policy data, making them sample-inefficient and requiring $O(\epsilon^{-2})$ trajectories to reach an $\epsilon$-approximate stationary point.