Home › Knowledge Base › LLM Dataset Inference

LLM Dataset Inference

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

SPECTRA: Revealing the Full Spectrum of User Preferences via Distributional LLM Inference

arXiv:2509.24189v4 Announce Type: replace Abstract: Large Language Models (LLMs) are increasingly used to model user preferences, with the typical output as a directly-generated ranked item list per user. However, this generative paradigm inherits the bias and opacity of autoregressive decoding. It over-emphasizes frequent (head) preferences and suppresses minority, long-tail ones.

arXiv CS 1d ago

Implicit Geographic Inference in LLM Medical Triage: Language-Driven Disparities in Emergency Recommendations

arXiv:2606.01204v1 Announce Type: new Abstract: We investigate whether large language models produce different medical triage recommendations for identical symptoms based solely on the language of the patient prompt. Using Gemini 3.5 Flash, we evaluate a neurological symptom profile (persistent headache, blurred vision, nausea) across six languages (English, Spanish, Chinese, Hindi, Japanese, Arabic) with 30 runs per condition (n=450 total API calls). We find that the model recommends...

arXiv CS 8d ago

The Reliability Gap in Benchmark Auditing: Distribution Shift and Scale as Failure Modes of Contamination Detection

arXiv:2606.03305v1 Announce Type: new Abstract: Benchmark contamination, where evaluation examples appear in a model's training data, threatens the validity of LLM assessment. Statistical tools for detecting training-data membership exist, but have been validated almost exclusively in controlled academic regimes: large, homogeneous pre-training corpora and transparent, single-stage training pipelines. Whether these methods remain reliable in realistic auditing scenarios remains unclear.

arXiv CS 7d ago

Clairvoyant: Predictive SJF Scheduling to Mitigate Head-of-Line Blocking in Serial LLM Backends

Announce Type: new Abstract: Serial LLM inference backends -- such as Ollama -- process requests one at a time under FCFS admission, causing Head-of-Line Blocking (HOLB) under mixed workloads at high utilisation: short factual queries can be delayed by minutes behind long generation jobs. While cloud-scale deployments mitigate HOLB via continuous batching (vLLM, Orca), these solutions require tens of GB of VRAM for concurrent KV-caches -- infeasible for memory-constrained edge and local...

arXiv CS 2d ago

CauTion: Knowing When to Trust LLMs for Ensemble Causal Discovery

arXiv:2606.03602v1 Announce Type: new Abstract: Causal discovery from observational data remains challenging due to the fundamental limitations of purely statistical methods, such as statistical distinguishability within equivalence classes and sensitivity to finite sample sizes. While large language models (LLMs) offer a promising source of domain knowledge to complement statistical inference, existing LLM-augmented methods are vulnerable to LLM errors and incur high token costs. Moreover,...

arXiv CS 7d ago

Inference Cost Attacks for Retrieval-Augmented Large Language Models

arXiv:2606.02643v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG)-enhanced LLM systems, while powerful, introduce substantial inference costs due to the inclusion of an extra multi-stage pipeline that dynamically retrieves and synthesizes information from external knowledge sources. This high operational cost exposes a critical vulnerability to Inference Cost Attacks (ICAs). However, existing ICAs often rely on the impractical assumption of direct prompt manipulation.

arXiv CS 7d ago

PRECISE: Reducing the Bias of LLM Evaluations Using Prediction-Powered Ranking Estimation

Announce Type: cross Abstract: Evaluating the quality of search, ranking and RAG systems traditionally requires a significant number of human relevance annotations. In recent times, several deployed systems have explored the usage of Large Language Models (LLMs) as automated judges for this task while their inherent biases prevent direct use for metric estimation. We present a statistical framework extending Prediction-Powered Inference (PPI) that combines minimal human annotations with LLM...

arXiv CS 6d ago

Emergence of Context Characteristics Sensitivity in Large Language Models

Announce Type: new Abstract: During instruction fine-tuning (IFT), large language models (LLMs) learn to follow instructions by using the provided context to answer a query. While prior work has studied how context characteristics correlate with context usage by the LLM, this analysis has been limited to inference time, leaving open how these relationships are acquired in the first place. Here, we measure how models' sensitivity to such characteristics shifts across successive IFT stages:...

arXiv CS 1d ago

Agent Tools Orchestration Leaks More: Dataset, Benchmark, and Mitigation

arXiv:2512.16310v3 Announce Type: replace Abstract: LLM-based agents increasingly use multiple external tools to complete complex tasks. We study Tools Orchestration Privacy Risk (TOP-R): an agent may combine individually non-sensitive tool returns and disclose an unintended sensitive conclusion. We formalize TOP-R with three conditions: conclusion sensitivity, single-source non-inferability, and compositional inferability.

arXiv CS 8d ago

The Regularizing Power of Language-Training Deepfake Detectors

arXiv:2605.31192v1 Announce Type: new Abstract: Recently, thanks to the advent of Multimodal-LLMs, deepfake detectors are striving not only to be generalizable but also interpretable. We propose that these two challenges can effectively be tackled jointly, since describable artifacts typically generalize better, opening the possibility to use language as a regularization mechanism.

arXiv CS 9d ago