Home Knowledge Base Comprehensive Spatial-Temporal Representation Learning Framework

Comprehensive Spatial-Temporal Representation Learning Framework

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

CoSTL: Comprehensive Spatial-Temporal Representation Learning for Moment Retrieval and Highlight Detection

arXiv:2606.01149v1 Announce Type: new Abstract: Video Moment Retrieval (MR) and Highlight Detection (HD) are crucial tasks in video analysis that aim to localize specific moments and estimate clip-wise relevance based on a given text query. Recent approaches treat them as similar video grounding tasks and use the same architecture to solve them. These tasks require both fine-grained comprehension at the image level and high-level temporal understanding across the entire video.

arXiv CS 8d ago