Home Knowledge Base MMEB

MMEB

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Reconstructing Content via Collaborative Attention to Improve Multimodal Embedding Quality

arXiv:2603.01471v2 Announce Type: replace Abstract: Multimodal embedding models, rooted in multimodal large language models (MLLMs), have yielded significant performance improvements across diverse tasks such as retrieval and classification. However, most existing approaches rely heavily on large-scale contrastive learning, with limited exploration of how the architectural and training paradigms of MLLMs affect embedding quality. While effective for generation, the causal attention and...

arXiv CS 8d ago

Reconstructing Content with Collaborative Attention for Universal Multimodal Representation Learning

arXiv:2603.01471v3 Announce Type: replace Abstract: Multimodal embedding models, rooted in multimodal large language models (MLLMs), have yielded significant performance improvements across diverse tasks such as retrieval and classification. However, most existing approaches rely heavily on large-scale contrastive learning, with limited exploration of how the architectural and training paradigms of MLLMs affect embedding quality. While effective for generation, the causal attention and...

arXiv CS 7d ago

Conan-embedding-v3: Fusing Modality-Specific Models for Omni-Modal Embedding

arXiv:2606.09331v1 Announce Type: new Abstract: Omni-modal retrieval promises a single embedding space for text, image, video, document, and audio inputs, but building such a unified retriever is difficult since these modalities differ in data distribution, architecture, and optimization dynamics. In this work, we present Conan-embedding-v3, a decouple--fuse--recover framework for omni-modal retrieval. Conan-embedding-v3 first trains modality specialists independently and fuses their task...

arXiv CS 1d ago