Textual Supervision Enhances Geospatial Representations in Vision-Language Models

arXiv CS Monday 08 June 2026, 04:00 UTC By Marcelo Sartori Locatelli, Fernando Tonucci, Jea Kwon, Luiz Felipe Vecchietti, Bryan Nathanael Wijaya, Cheng Yaw Low, Virgilio Almeida, Meeyoung Cha 1 min read

Key Points

Announce Type: new Abstract: Geospatial understanding is a critical yet underexplored dimension in the development of machine learning systems for tasks such as image geolocation and spatial reasoning. In this work, we analyze the geospatial representations acquired by three model families: vision-only architectures (e.g., ViT), vision-language models (e.g., CLIP), and large-scale multimodal foundation models (e.g., LLaVA, Qwen, and Gemma). By evaluating across image clusters, including...

arXiv:2606.07172v1 Announce Type: new Abstract: Geospatial understanding is a critical yet underexplored dimension in the development of machine learning systems for tasks such as image geolocation and spatial reasoning. In this work, we analyze the geospatial representations acquired by three model families: vision-only architectures (e.g., ViT), vision-language models (e.g., CLIP), and large-scale multimodal foundation models (e.g., LLaVA, Qwen, and Gemma). By evaluating across image clusters, including people, landmarks, and everyday objects, grouped based on the degree of localizability, we reveal systematic gaps in spatial accuracy and show that textual supervision enhances the learning of geospatial representations. Our findings suggest the role of language as an effective complementary modality for encoding spatial context and multimodal learning as a key direction for advancing geospatial AI.

Textual Supervision Enhances Geospatial Representations (ORG) ViT (ORG) CLIP (ORG) Qwen (PERSON) Gemma (PERSON) AI (ORG)

Originally published by arXiv CS Read original →

Textual Supervision Enhances Geospatial Representations in Vision-Language Models

Related Stories

A.I. Chatbot Helps a $100 Thrift Store Painting Sell for Over $250,000

A.I. Chatbot Helps a $100 Thrift Store Painting Sell for Over $250,000

You can personalize your Instagram algorithm now — unless you want to see more posts from accounts you follow

Musk Stock Fans Say ‘The More, The Better’ in SpaceX IPO Frenzy