Home Knowledge Base MME

MME

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

ACTIVE-o3: Empowering MLLMs with Active Perception via Pure Reinforcement Learning

arXiv:2505.21457v2 Announce Type: replace Abstract: Active vision, also known as active perception, refers to actively selecting where and how to look in order to gather task-relevant information. It is a critical component of efficient perception and decision-making in humans and advanced embodied agents. With the rise of Multimodal Large Language Models (MLLMs) as central planners in robotic systems, the lack of methods for equipping MLLMs with active perception has become a key gap.

arXiv CS 1d ago

MACD: Model-Aware Contrastive Decoding via Counterfactual Data

arXiv:2602.01740v3 Announce Type: replace Abstract: Video language models (Video-LLMs) are prone to hallucinations, generating plausible but ungrounded content when visual evidence is weak, ambiguous, or biased. Existing methods, such as contrastive decoding (CD), rely on random perturbations to construct contrastive data for hallucination mitigation, but often fail to target the visual cues that drive hallucination or align with model weaknesses. We propose Model-Aware Counterfactual Data...

arXiv CS 2d ago

MorphoQuant: Modality-Aware Quantization for Omni-modal Large Language Models

Announce Type: new Abstract: Conventional Post-Training Quantization (PTQ) methods struggle with 4-bit Omni-modal Large Language Models (OLLMs) due to the extreme distribution heterogeneity and disparate outlier patterns across modalities. To address this, we propose MorphoQuant, a modality-aware PTQ framework engineered to preserve cross-modal morphology and mitigate outlier loss. Specifically, we introduce Distribution-Aware Bias Compensation (DABC), which selectively absorbs long-tailed...

arXiv CS 6d ago

Multilingual Training and Evaluation Resources for Vision-Language Models

arXiv:2604.18347v2 Announce Type: replace Abstract: Vision Language Models (VLMs) achieved rapid progress in the recent years. However, despite their growth, VLMs development is heavily grounded on English, leading to two main limitations: (i) the lack of multilingual and multimodal datasets for training, and (ii) the scarcity of comprehensive evaluation benchmarks across languages. In this work, we address these gaps by introducing a new comprehensive suite of resources for VLMs training...

arXiv CS 1d ago

MorphoQuant: Modality-Aware Quantization for Omni-modal Large Language Models

arXiv:2606.04349v2 Announce Type: replace Abstract: Conventional Post-Training Quantization (PTQ) methods struggle with 4-bit Omni-modal Large Language Models (OLLMs) due to the extreme distribution heterogeneity and disparate outlier patterns across modalities. To address this, we propose MorphoQuant, a modality-aware PTQ framework engineered to preserve cross-modal morphology and mitigate outlier loss. Specifically, we introduce Distribution-Aware Bias Compensation (DABC), which...

arXiv CS 2d ago