ReCoVLA: VLM-Guided Reward Compilation for Failure Recovery in Vision-Language-Action Policies

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Haodi Hu, Chung-Ta Huang, Jing Liu, Ye Wang, Kei Suzuki, Matthew Brand, Toshiaki Koike-Akino 1 min read

Key Points

Announce Type: new Abstract: Vision-language-action (VLA) policies provide strong priors for language-conditioned manipulation, but remain brittle in off-nominal states requiring targeted recovery. We propose ReCoVLA -- a failure-conditioned residual recovery framework that keeps a pretrained VLA policy frozen, uses an external vision-language model (VLM) to infer the failure mode and recovery stage, and compiles a structured reward from task-relevant components. Rather than using the VLM to...

arXiv:2606.09630v1 Announce Type: new Abstract: Vision-language-action (VLA) policies provide strong priors for language-conditioned manipulation, but remain brittle in off-nominal states requiring targeted recovery. We propose ReCoVLA -- a failure-conditioned residual recovery framework that keeps a pretrained VLA policy frozen, uses an external vision-language model (VLM) to infer the failure mode and recovery stage, and compiles a structured reward from task-relevant components. Rather than using the VLM to generate actions or rewards directly, ReCoVLA uses it as a semantic reward selector: it predicts a recovery descriptor and reward mask for in-simulation residual-policy training, followed by zero-shot sim-to-real deployment of the trained recovery policies. This decouples high-level failure understanding from low-level corrective control to support different VLAs. Experiments across short-horizon, long-horizon, and contact-rich manipulation tasks show that ReCoVLA outperforms the tested baselines on average. In simulation, our reward compiler improves average success from 36.7% for the fine-tuned $\pi_{0.5}$ baseline to 66.7%. In physical zero-shot sim-to-real experiments, ReCoVLA achieves the best average performance, with 61.7% success.

VLA (ORG) VLM (ORG)

Originally published by arXiv CS Read original →

ReCoVLA: VLM-Guided Reward Compilation for Failure Recovery in Vision-Language-Action Policies

Related Stories

Exclusive-GM may ditch LFP batteries for future EVs

Musk Stock Fans Say ‘The More, The Better’ in SpaceX IPO Frenzy

Musk Stock Fans Say ‘The More, The Better’ in SpaceX IPO Frenzy

Whale graveyard dating back five million years discovered