Robot-DIFT: Correspondence-Sensitive Diffusion Features for Contact-Rich Robot Manipulation

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Yu Deng, Yufeng Jin, Xiaogang Jia, Jiahong Xue, Gerhard Neumann, Georgia Chalvatzaki 1 min read

Key Points

arXiv:2602.11934v2 Announce Type: replace Abstract: Robot manipulation often fails in the final millimeters: a policy may recognize the right object yet miss the pose offsets, boundaries, or pre-contact alignments needed for action. We argue that such failures arise when semantic invariance suppresses correspondence cues for closed-loop control, or when these cues are not exposed to the policy in a usable form. Modern visual encoders provide strong semantic abstractions, but contact-rich manipulation requires correspondence sensitivity: discriminative feature responses to action-relevant changes in pose, boundary, and contact geometry. Diffusion features provide a strong prior for dense correspondence, but direct use is impractical due to stochasticity, latency, and representation drift. We introduce Robot-DIFT, a deterministic diffusion-derived backbone for real-time control. Through Manifold Distillation, Robot-DIFT converts a noise-conditioned diffusion Teacher into a clean-input, single-pass Student while preserving the teacher's feature manifold. A Spatial--Semantic Feature Pyramid Network (S2-FPN) fuses coarse-to-fine Student decoder features into visual tokens that expose semantic context and fine contact detail to the policy. Across RoboCasa, LIBERO-10, and real robots, Robot-DIFT outperforms vision--language, self-supervised, geometry-oriented, and diffusion baselines on contact-sensitive tasks. Controlled backbone/readout swaps show that S2-FPN unlocks, rather than replaces, the diffusion correspondence prior.

Robot-DIFT (ORG) RoboCasa (ORG)

Originally published by arXiv CS Read original →

Robot-DIFT: Correspondence-Sensitive Diffusion Features for Contact-Rich Robot Manipulation

Related Stories

You can personalize your Instagram algorithm now — unless you want to see more posts from accounts you follow

Super Micro Seeks $7B in Equity Deal for AI Equipment

Ubisoft reportedly shuts down more studios and lays off staff in Barcelona and San Francisco

Anthropic CEO Says Government Should Be Able to Block New Models