Home › Knowledge Base › Token F1

Token F1

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

SAGE: A Novelty Gate for Efficient Memory Evolution in Agentic LLMs

Announce Type: new Abstract: Agentic LLMs must continuously decide whether newly extracted facts should be added, merged with existing memories, or ignored, yet prior work has focused more on retrieval and storage than on principled write-side control. We frame memory evolution as a novelty-detection problem and propose SAGE, a Spherical Adaptive Gate for memory Evolution that scores candidate facts with a von Mises-Fisher-based density estimator over memory embeddings and routes them with...

arXiv CS 9d ago

Scaling Multi-Hop Training Data via Graph-Constrained Path Selection

arXiv:2605.31238v1 Announce Type: new Abstract: Endowing large language models with compositional reasoning over specialized documents requires multi-hop training data at scale, where such data rarely exists outside of curated benchmarks built on structured sources. To construct it directly from plain, unannotated text, existing methods ask a single teacher model to jointly discover an evidence path through a document and verbalize it as a question-answer pair. However, these methods degrade...

arXiv CS 9d ago

Aligning Dense Retrievers with LLM Utility via Distillation

arXiv:2604.22722v2 Announce Type: replace Abstract: Dense vector retrieval is the practical backbone of Retrieval- Augmented Generation (RAG), but similarity search can suffer from precision limitations. Conversely, utility-based approaches leveraging LLM re-ranking often achieve superior performance but are computationally prohibitive and prone to noise inherent in perplexity estimation. We propose Utility-Aligned Embeddings (UAE), a framework designed to merge these advantages into a...

arXiv CS 9d ago

EMBER: Efficient Memory via Budgeted Evidence Retention for Long-Horizon Agents

Announce Type: new Abstract: Long-horizon agents can archive large histories, but future answers still incur retrieval, rereading, and context costs. When retained memory misses answer-relevant evidence, the system must return to larger portions of the raw history. We study budgeted evidence survival: before the query is known, which source evidence should be retained so that it remains recoverable and usable under a fixed retained source-evidence token budget?

arXiv CS 5d ago

S3Mem: Structured Spatiotemporal Scene-Event Memory for Long-Horizon Interactive Question Answering

arXiv:2605.28831v2 Announce Type: replace Abstract: Long-horizon memory question answering often requires sparse evidence from heterogeneous histories, including events, object states, visual observations, temporal relations, and causal steps. Existing memory interfaces expand reader context, retrieve semantically related chunks, or expose graph neighborhoods, but they are not explicitly designed to select compact evidence for a fixed reader. We propose Structured Spatiotemporal Scene--Event...

arXiv CS 1d ago

FIDES: Faithful Inference via Deep Evidence Signals for Retrieval-Memory Conflict in RAG

new Abstract: When retrieved evidence contradicts parametric memory, language models frequently ignore context and default to memorized priors -- a failure that undermines the core purpose of retrieval augmentation. Contrastive decoding amplifies the context-conditioned output to suppress parametric bias, but existing methods rest on an implicit assumption that this bias is uniform across tokens. A single global contrastive weight over-penalizes safe tokens while leaving genuinely conflicted...

arXiv CS 5d ago

When Do Attention Circuits Form? Developmental Trajectories of Capability and Attention-Sink Emergence Across Three 1B-ClassArchitectures

arXiv:2606.02378v1 Announce Type: new Abstract: We track the developmental trajectory of attention-head circuit formation across three 1B-class language models spanning two architecture families (dense transformer, mixture-of-experts) and two pretraining corpora (The Pile, DCLM): Pythia 1B, OLMo 1B-0724-hf, and OLMoE 1B-7B-0924. At each of 10 log-spaced revisions per model -- 30 mechanistic-interpretability runs in total -- we apply a participation-ratio (PR) spectral signal and an all-head...

arXiv CS 8d ago

RASER: Recoverability-Aware Selective Escalation Router for Multi-Hop Question Answering

arXiv:2606.02488v1 Announce Type: new Abstract: Multi-hop question-answering systems often use expensive retrieval on every question. They may decompose the question, run several retrieval rounds, or search through bridge entities before answering. All of these strategies rely on repeated LLM calls to rewrite or decompose the question, which increases extra token cost, and it is not fitting when the LLM budget is tight.

arXiv CS 8d ago

VulnAgent-R2: Evidence-Calibrated Multi-Agent Auditing for Repository-Level Vulnerability Detection

Announce Type: replace Abstract: Software vulnerabilities often depend on cross-file data flow, build options, framework conventions, and runtime guards, so isolated function classifiers produce fragile and poorly calibrated warnings. Repository-level LLM agents can gather richer evidence, but prior variants under-specify reproducibility, verifier behavior, baseline fairness, and statistical uncertainty. We present VulnAgent-R2, a budget-aware agentic auditing framework with three additional...

arXiv CS 6d ago

VulnAgent-R2: Evidence-Calibrated Multi-Agent Auditing for Repository-Level Vulnerability Detection

arXiv CS 7d ago