PathPainter: Transferring the Generalization Ability of Image Generation Models to Embodied Navigation

arXiv CS Monday 08 June 2026, 04:00 UTC By Yijin Wang, Yuru Tian, Xijie Huang, Weiqi Gai, Mo Zhu, Xin Zhou, Yuze Wu, Fei Gao 1 min read

Key Points

arXiv:2605.07496v2 Announce Type: replace Abstract: Bird's-eye-view (BEV) images have been widely demonstrated to provide valuable prior information for navigation. Given the global information provided by such views, two key challenges remain: how to fully exploit this information and how to reliably use it during execution. In this paper, we propose a navigation system that uses BEV images as global priors and is designed for ground and near-ground robotic platforms. The system employs an image generation model to interpret human intent from natural language, identify the target destination, and generate traversability masks. During execution, we introduce cross-view localization to align the robot's odometry with the BEV map and mitigate long-term drift in conventional odometry. We conduct extensive benchmark experiments to evaluate the proposed method and further validate it on a UAV platform. Using only a conventional local motion planner, the UAV successfully completes a 160-meter outdoor long-range navigation task. This work demonstrates how the world-understanding capabilities of foundation models can be transferred to embodied navigation, enabling robots to benefit from the strong generalization ability of existing image generation models.

PathPainter (PERSON) BEV (ORG) UAV (ORG)

Originally published by arXiv CS Read original →

PathPainter: Transferring the Generalization Ability of Image Generation Models to Embodied Navigation

Related Stories

Valve will stop producing physical Steam gift cards because of scammers

Oracle Reports Higher-Than-Expected Data Center Spending

Citi Says Investors Growing More Selective on Data Center Bonds

CEO of North Growth Management Highlights Strong Earnings Growth Fueled by AI