Subtraction Gets You More: Gap-Aware Retrieval for Multimodal Multi-Hop QA

arXiv CS Monday 08 June 2026, 04:00 UTC By Sunah O, Jay-Yoon Lee 1 min read

Key Points

arXiv:2605.28641v2 Announce Type: replace Abstract: In multimodal multi-hop question answering, we focus on the initial retrieval stage via two distinct tasks: (1) evidence set completion, retrieving missing evidence given context, and (2) sequential pool construction, iteratively building the top-$K$ pool from the scratch. Under these settings, we point out that conventional iterative retrieval frameworks often suffer from Semantic Anchoring, where previously fetched evidence traps the retriever and yields entity-centric redundancy. To break this trap, we propose GRAIL (Gap-aware Retrieval via Adaptive Implicit Localization), a paradigm that performs implicit query rewriting directly at the embedding level. By context-subtractive query steering, GRAIL excels at compositional cross-modal reasoning, while additive embedding updates show strength on localized information aggregation. By dynamically routing queries based on task type, our Hybrid Framework achieves a 40.3% macro-averaged performance gain on MultimodalQA. Extensive evaluations demonstrate that sequential GRAIL retrieves in a superior, noise-resilient manner, significantly expanding the search horizon through iterative gap-aware optimization.

Semantic Anchoring (ORG) GRAIL (PERSON) Hybrid Framework (ORG)

Originally published by arXiv CS Read original →

Subtraction Gets You More: Gap-Aware Retrieval for Multimodal Multi-Hop QA

Related Stories

Japan’s Retail Investor Army Flocks to SpaceX After IPO Drought

NASA addresses criticism over all-male crew selected for Artemis III test mission

Jeffery Lee breathes ‘sigh of relief’ after Alabama’s nitrogen execution deemed unconstitutional

Nearly Everyone, Everywhere, Veers Left When Walking