HARP-VLA: Human-Robot Aligned Representation Learning
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
HARP-VLA: Human-Robot Aligned Representation Learning for Vision-Language-Action Model
Announce Type: new Abstract: Learning generalizable vision-language-action (VLA) models from large-scale human videos is promising but challenging due to cross-embodiment discrepancies in both visual observations and executable actions. While latent action models reduce the action execution gap by learning action abstractions, they still rely on visual features.