Home Knowledge Base MMVP

MMVP

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

MoDA: Modulation Adapter for Fine-Grained Visual Grounding in Instructional MLLMs

arXiv:2506.01850v2 Announce Type: replace Abstract: Multimodal Large Language Models (MLLMs) have achieved remarkable success in instruction-following tasks by integrating pretrained visual encoders with large language models (LLMs). However, existing approaches often struggle with fine-grained visual grounding due to semantic entanglement in visual patch representations, where individual patches blend multiple distinct visual elements, making it difficult for models to focus on...

arXiv CS 2d ago

Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings

Announce Type: replace Abstract: Hallucinations in Large Vision-Language Models (LVLMs) remain a persistent challenge, often stemming from inadequate integration of visual information during multimodal reasoning. A key cause is the model's over-reliance on textual priors and underutilization of visual cues, leading to outputs that are linguistically fluent but visually inaccurate. For example, given an image of an empty kitchen countertop, an LVLM might hallucinate a "bowl of fruit" or "cup...

arXiv CS 1d ago