Home Knowledge Base MaxSim

MaxSim

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Col-Bandit: Query-Time Top-$K$ Estimation for Late-Interaction Retrieval

Announce Type: replace Abstract: Multi-vector late-interaction retrievers such as ColBERT achieve state-of-the-art quality, but their query-time cost is dominated by exhaustively computing token-level MaxSim interactions for every candidate document. The MaxSim scores of $N$ candidates against $T$ query tokens form an $N\times T$ matrix whose row-sums are the late-interaction scores, and identifying the top-$K$ rarely requires every entry. We introduce Col-Bandit, a query-time estimator of...

arXiv CS 7d ago

Argus-Retriever: Vision-LLM Late-Interaction Retrieval with Region-Aware Query-Conditioned MoE for Visual Document Retrieval

arXiv:2606.04300v1 Announce Type: new Abstract: Late-interaction vision-language retrievers represent each document page as many visual token embeddings and score queries with MaxSim. In systems such as ColPali, ColQwen, ColNomic, and Nemotron ColEmbed, the document embeddings are produced without seeing the query, so the same page is represented identically for a table lookup, a chart question, and a layout-sensitive evidence request. We introduce \textbf{Argus}, a family of...

arXiv CS 6d ago

ColBERTSaR: Sparsified ColBERT Index via Product Quantization

Announce Type: new Abstract: While ColBERT is an effective neural retrieval architecture, it requires a heavy index structure to support candidate set retrieval based on approximated token embeddings, gathering and decompressing document token embeddings, and applying the MaxSim operation. Indexes in PLAID and similar ColBERT implementations require five to ten times the disk storage of the original raw text, which limits their scalability. Furthermore, prior work has identified that the...

arXiv CS 5d ago

Test-Time Compute for Frozen Embedding Models through Agentic Program Search

arXiv:2605.11374v5 Announce Type: replace Abstract: Test-time compute is widely believed to benefit only large reasoning models, leaving small models with nothing to gain. We argue the opposite for dense retrieval, since modern small embedding models are distilled or adapted from large language model backbones and can inherit their latent test-time-compute potential. We ask how much retrieval quality a frozen embedding model gains at inference alone, with no auxiliary model and no parameters...

arXiv CS 8d ago