Home › Knowledge Base › the Uncertainty of Large Language Models

the Uncertainty of Large Language Models

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Inference-Time Conformal Reasoning with Valid Factuality Control for Large Language Models

Announce Type: new Abstract: Large language models (LLMs) increasingly perform multi-step reasoning, where intermediate claims form implicit directed acyclic graphs whose node correctness is structurally conditioned on their ancestors. This makes factuality uncertainty structural, rather than a trivial accumulation of node-wise errors, and necessitates inference-time uncertainty quantification over the reasoning structure. While conformal prediction (CP) offers flexible user-specified...

arXiv CS 1d ago

LLM-XTM: Enhancing Cross-Lingual Topic Models with Large Language Models

Announce Type: replace Abstract: Cross-lingual topic modeling aims to discover shared semantic structures across languages, yet existing models depend on sparse bilingual resources and often yield incoherent or weakly aligned topics. Recent LLM-based refinements improve interpretability but are costly, document-level, and prone to hallucination, with prior white-box approaches requiring inaccessible token probabilities. We propose LLM-XTM, a framework that integrates LLM-guided topic...

arXiv CS 7d ago

Human-Alignment and Calibration of Inference-Time Uncertainty in Large Language Models

arXiv:2508.08204v2 Announce Type: replace Abstract: There has been much recent interest in evaluating large language models for uncertainty calibration to facilitate model control and modulate user trust. Inference time uncertainty, which may provide a real-time signal to the model or external control modules, is particularly important for applying these concepts to improve LLM-user experience in practice. While many of the existing papers consider model calibration, comparatively little...

arXiv CS 9d ago

SeSE: Black-Box Uncertainty Quantification for Large Language Models Based on Structural Information Theory

Announce Type: replace Abstract: Reliable uncertainty quantification (UQ) is essential for deploying large language models (LLMs) in safety-critical scenarios, as it enables them to abstain from responding when uncertain, thereby avoiding hallucinations, i.e., plausible yet factually incorrect responses. However, while semantic UQ methods have achieved advanced performance, they overlook latent semantic structural information that could enable more precise uncertainty estimates. In this...

arXiv CS 7d ago

Gradients with Respect to Semantics Preserving Embeddings Tell the Uncertainty of Large Language Models

arXiv:2605.04638v2 Announce Type: replace Abstract: Uncertainty quantification (UQ) is an important technique for ensuring the trustworthiness of LLMs, given their tendency to hallucinate. Existing state-of-the-art UQ approaches for free-form generation rely heavily on sampling, which incurs high computational cost and variance. In this work, we propose the first gradient-based UQ method for free-form generation, SemGrad, which is sampling-free and computationally efficient.

arXiv CS 8d ago

Human-Alignment, Calibration, and Activation Patterns in Large Language Model Uncertainty

Announce Type: new Abstract: Uncertainty Quantification is a large and growing subfield of large language model behavioral analysis. Primarily to recognize and combat hallucination, the field has largely focused on measuring and improving calibration, the accuracy of uncertainty judgments to task efficacy.

arXiv CS 9d ago

Empirical Characterization of Inference-Time Elicited Probability Transformations in Large Language Models

Announce Type: replace Abstract: Large language models increasingly rely on inference-time procedures such as chain-of-thought reasoning, self-refinement, retrieval augmentation, and verifier-guided revision, yet the structure of elicited probability transformations under these procedures remains poorly understood. We study externally elicited probability assignments over candidate answers and observe recurring approximate log-ratio relationships: \[ \log \tilde q_t(i) = \alpha_t \left( \log...

arXiv CS 9d ago

Clustered Self-Assessment: A Simple yet Effective Method for Uncertainty Quantification in Large Language Models

arXiv:2606.03846v1 Announce Type: new Abstract: Large language models (LLMs) demonstrate remarkable performance across diverse tasks, but they often generate responses that appear plausible while being factually incorrect. This problem is compounded by the lack of explicit uncertainty estimates, which makes it difficult for users to judge the reliability of model outputs. Existing uncertainty quantification methods typically rely on indirect signals, such as entropy across sampled generations.

arXiv CS 7d ago

Latent Performance Profiling of Large Language Models

Announce Type: replace Abstract: Large language models (LLMs) frequently achieve impressive scores on standardized benchmarks, yet accuracy alone offers a limited view of their capabilities. Evaluating open-source LLMs through leaderboards faces persistent issues like data contamination, narrow task scope, and weak alignment with real-world reliability. Benchmark-based evaluations such as MMLU PRO, BBH, or IFEval primarily capture what a model outputs on fixed test sets, not how it processes...

arXiv CS 9d ago

LUNA-AD: Lightweight Uncertainty-Aware Language Model with Lifelong Learning for Autonomous Driving

arXiv:2606.08470v1 Announce Type: new Abstract: While large language models (LLMs) offer promising reasoning capabilities, their integration into safety-critical driving systems is hindered by limited reasoning diversity, high computational overhead, and static learning paradigms. To address these challenges, we propose LUNA-AD, a lightweight uncertainty-aware language model with lifelong learning for autonomous driving (AD). LUNA-AD features a tri-system architecture that reconciles complex...

arXiv CS 1d ago