Home › Knowledge Base › Relevance Oracles

Relevance Oracles

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

SPECTRA: Synthetic IR Test Collections with Relevance Oracles and Controlled Distractor Diagnostics

Announce Type: new Abstract: Scalable information retrieval testing needs corpora that are large enough to stress index construction, ranking latency, query routing, and evaluation tooling, yet human-judged test collections remain expensive and may be unavailable when documents are private or still under design. This paper introduces SPECTRA, a reproducible framework for generating synthetic text corpora and retrieval test collections through a separation of latent topical structure, surface...

arXiv CS 9d ago

ORACLE-CT: Anatomy-Aware Support Pooling for CT Classification

Announce Type: new Abstract: Abdominal CT disease classification is challenging because each scan is a large 3D volume with many possible findings, while diagnostic evidence is often confined to specific organs or anatomical compartments. Most study-level classifiers aggregate encoder features using anatomy-agnostic pooling or attention, creating a mismatch between localized disease evidence and global evidence aggregation. We propose ORACLE--CT, an encoder-agnostic anatomy-aware aggregation...

arXiv CS 5d ago

Precision Is Not Faithfulness: Coverage-Aware Evaluation of Grounded Generation with a Complete Oracle

Announce Type: new Abstract: Reference-free faithfulness metrics verify each atomic claim a model makes against ground truth, and are increasingly used to evaluate grounded generation. We show they share a blind spot: they measure only precision -- are the stated claims supported? -- and therefore reward abstention, since a model can score near-perfect faithfulness by saying almost nothing.

arXiv CS 1d ago

QO-Bench: Diagnosing Query-Operator-Preserving Retrieval over Typed Event Tuples

Announce Type: new Abstract: Many real-world questions over business, legal, and scientific corpora are natural-language versions of database-style queries over records latent in text. Existing retrieval-augmented generation (RAG) systems are optimized primarily for semantic relevance, but retrieving plausible passages does not guarantee correct query execution. We introduce QO-Bench, a diagnostic benchmark for query-operator question answering over typed event tuples.

arXiv CS 6d ago

Learning What to Forget: Improving LLM Unlearning via Learned Token-Level Importance

arXiv:2606.06320v1 Announce Type: new Abstract: Machine unlearning aims to remove targeted knowledge from a trained model while preserving its general capabilities. For autoregressive language models, not all tokens in a forget sample are equally relevant to forgetting. Existing approaches either ignore this heterogeneity or rely on auxiliary models, heuristics, or external annotations to estimate each token's relevance for forgetting.

arXiv CS 5d ago

PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper Streams

Announce Type: new Abstract: Scientific paper recommendation is typically evaluated as static ranking over a fixed candidate set, yet real scientific reading unfolds as a daily, longitudinal process in which interests shift and feedback accumulates. We introduce PaperFlow, a framework that organizes it into three coupled stages: Profiling, which constructs and maintains a structured, inspectable scholarly profile from heterogeneous cold-start evidence; Recommending, which ranks each...

arXiv CS 2d ago

MASER: Modality-Adaptive Specialist Routing for Embodied 3D Spatial Intelligence

arXiv:2606.02463v1 Announce Type: new Abstract: In 3D environments, Embodied Agents answer spatially relevant questions through reasoning from a mixture of modalities including natural language, RGB images, point clouds, depth maps and camera poses. Existing Vision-Language models (VLMs) are fine-tuned over a single modality. This completely ignores the question semantics which may favor a different modality than the finetuned modality.

arXiv CS 8d ago

Linear Strategic Classification with Endogenous Improvements

arXiv:2606.01198v1 Announce Type: new Abstract: Strategic classification studies settings in which agents respond to a deployed classifier by modifying observable features at a cost. Classical models typically treat such responses as cosmetic: features may change, but true labels remain fixed. We study an improvement-aware variant in which strategic responses can induce genuine changes in outcome-relevant features.

arXiv CS 8d ago

Bridging Expert Knowledge and Automated Feature Engineering via Self-Evolution

arXiv:2606.08800v1 Announce Type: new Abstract: In high-stakes settings such as brand compliance, clinical care, and content moderation, machine learning cannot be deployed as opaque oracles: practitioners inspect the features driving model decisions, and models must leverage the expert documentation governing these domains. In practice, the data arrives as unstructured content, and features extracted from it must be interpretable, discriminative, and aligned with what experts consider...

arXiv CS 1d ago

AI Job Grief: The Unnamed Psychological Crisis Hitting Tech Workers

AI Job Grief: The Unnamed Psychological Crisis Hitting Tech Workers In the summer of 2025, an Epic Games layoff cut a worker who was a terminally ill father. According to the most-discussed account of the episode, his family lost his life insurance along with the job.

Hacker News 11d ago