Home Knowledge Base VSTAT

VSTAT

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Benchmarking Visual State Tracking in Multimodal Video Understanding

arXiv:2606.03920v1 Announce Type: new Abstract: Understanding a video requires more than recognizing isolated moments, as humans continuously track entities, states, and events over time. This capacity for visual state tracking is fundamental to video understanding, yet remains underexplored in current evaluations of Multimodal Large Language Models (MLLMs). We introduce Visual STAte Tracking benchmark (VSTAT), a video-based benchmark designed to diagnose visual state tracking in MLLMs.

arXiv CS 7d ago