Home › Knowledge Base › InternVL3

InternVL3

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Seeing Without Exposing: Adaptive Privacy Control for Open-World, Context-Hungry MLLMs

Announce Type: new Abstract: Multimodal large language models (MLLMs) have raised new privacy challenges. On the data side, user-provided inputs often include unpredictable sensitive information; while on the downstream task side, model reasoning depends on rich visual context that may itself be privacy-sensitive.

arXiv CS 2d ago

Perception First: A Frontier Native-Video Model with Self-Consistency for Implicit Video Question Answering

arXiv:2606.01485v1 Announce Type: new Abstract: We describe our submission to the VRR Challenge @ CVPR 2026, built on the \emph{ImplicitQA} / \emph{VRR-QA} benchmark~\cite{implicitqa}: multiple-choice video question answering in which answers are deliberately \emph{not} observable in any single frame and must be inferred from spatial layout, motion, depth, viewpoint, causality, and social context across discontinuous frames of creative video. We conduct a systematic, training-free study...

arXiv CS 8d ago

HiDe: Rethinking The Zoom-IN method in High Resolution MLLMs via Hierarchical Decoupling

arXiv:2510.00054v3 Announce Type: replace Abstract: Multimodal Large Language Models (MLLMs) have made significant strides in visual understanding tasks. However, their performance on high-resolution images remains suboptimal. While existing approaches often attribute this limitation to perceptual constraints and argue that MLLMs struggle to recognize small objects, leading them to use "zoom in" strategies for better detail, our analysis reveals a different cause: the main issue is not...

arXiv CS 5d ago

Drive-KD: Multi-Teacher Distillation for VLMs in Autonomous Driving

arXiv:2601.21288v2 Announce Type: replace Abstract: Autonomous driving is an important and safety-critical task, and recent advances in LLMs/VLMs have opened new possibilities for reasoning and planning in this domain. However, large models demand substantial GPU memory and exhibit high inference latency, while conventional supervised fine-tuning (SFT) often struggles to bridge the capability gaps of small models. To address these limitations, we propose Drive-KD, a framework that decomposes...

arXiv CS 5d ago

PInVerify: An Offline Embodied Benchmark for Active Instance Verification

arXiv:2605.30639v1 Announce Type: new Abstract: Embodied agents have made strong progress in navigating to target objects, but reaching the goal vicinity does not guarantee that the agent has found the correct instance: subtle attribute differences (e.g., "white floral" vs. "white striped") often require close-range, multi-view inspection. We address this gap with Active Instance Verification (AIV), a task in which an agent actively selects viewpoints around a candidate object to decide...

arXiv CS 9d ago

When Correct Decisions Hide Internal Stress: Decision-State Probing in Multimodal Language Models

Announce Type: new Abstract: Multimodal language models are typically evaluated through external behavior: selecting the correct image--text match, rejecting unsupported captions, or answering visual queries correctly. However, correct behavior alone does not show that the model's internal decision state remains stable under controlled semantic stress. We study this gap through S$^3$E (Structured Semantic Stress Evaluation), a framework for analyzing behavior-internal decoupling in...

arXiv CS 1d ago