Home Knowledge Base MSR-VTT

MSR-VTT

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

PEEK: Picking Essential frames via Efficient Knowledge distillation

arXiv:2605.31029v1 Announce Type: new Abstract: Video-language models can process only a limited number of frames, making frame selection a key bottleneck for efficient video captioning. Most captioning pipelines still rely on uniform sampling, which is computationally cheap but agnostic to visual content. Adaptive frame sampling has recently emerged as a promising approach for selecting the most informative frames from a video; however, existing methods remain computationally expensive.

arXiv CS 9d ago

MAVIS: Multi-Agent Video Retrieval via Structured Video Understanding

arXiv:2606.09641v1 Announce Type: new Abstract: The dominant paradigm in video retrieval relies on embedding-based full-corpus scanning, which suffers from inherent computational inefficiency and the semantic asymmetry between information-dense videos and sparse textual queries. To bridge this gap, we introduce \textbf{MAVIS}, a novel multi-agent framework that rethinks retrieval as cooperative reasoning rather than brute-force search. MAVIS first bridges the granularity mismatch by parsing...

arXiv CS 1d ago