Home Knowledge Base the Expected Calibration Error

the Expected Calibration Error

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Principled Uncertainty in Clinical AI: End-to-End Bayesian Modelling and Algorithmic Equity Auditing Across Multimodal Patient Data

arXiv:2606.09789v1 Announce Type: new Abstract: Clinical artificial intelligence (AI) systems routinely produce predictions without principled quantification of uncertainty, limiting their trustworthiness in high-stakes medical environments. This paper presents an integrated research programme addressing two interconnected problems: (1) the development of a fully end-to-end Bayesian uncertainty modelling framework for multimodal clinical data, and (2) the application of calibrated...

arXiv CS 1d ago

Calibrated Uncertainty for Trustworthy Clinical Gait Analysis Using Probabilistic Multiview Markerless Motion Capture

arXiv:2601.22412v2 Announce Type: replace Abstract: Video-based human movement analysis holds potential for movement assessment in clinical practice and research. However, the clinical implementation and trust of multi-view markerless motion capture (MMMC) require that, in addition to being accurate, these systems produce reliable confidence intervals to indicate how accurate they are for any individual. Building on our prior work utilizing variational inference to estimate joint angle...

arXiv CS 9d ago

The Confidence Trap: Calibration Attacks for Graph Neural Networks

Announce Type: new Abstract: While confidence calibration is essential for trustworthy decision-making in safety-critical applications, the robustness of calibrated GNNs to adversarial structural perturbations remains largely unexplored. However, studying calibration attacks on graphs presents unique technical challenges: (1) the discrete nature of graph structures complicates gradient-based optimization, (2) existing underconfidence objectives fail to drive predictions toward uniform...

arXiv CS 1d ago

MAAM: Anchor-Preserving Compression and Contextual Calibration for Chinese Discriminatory Language Detection

Announce Type: new Abstract: Chinese discriminatory-language detection is challenging because harmful intent is often implicit and context-dependent. We propose MAAM (Myopia--Astigmatism Anchor Mechanism), a lightweight, model-agnostic framework inspired by functional visual blur: rather than preserving every token equally, MAAM retains discrimination-relevant semantic anchors and calibrates them with C--I--S contextual priors (Contextual Tone, Group Identity, and Stance Polarity). We also...

arXiv CS 1d ago

CaliDist: Calibrating Large Language Models via Behavioral Robustness to Distraction

Announce Type: new Abstract: Existing calibration methods for Large Language Models (LLMs) often overlook a critical dimension of trustworthiness: a model's {\em behavioral robustness} to irrelevant or misleading information. In this paper, we argue that a model's true confidence should reflect its stability under cognitive pressure. We introduce \textsc{CaliDist}, a novel post-hoc calibration approach that directly measures and penalizes a model's susceptibility to distraction.

arXiv CS 5d ago

Reliable Multilingual Orthopedic Decision Support from Clinical Narratives: Language-Aware Adaptation and Verification-Guided Deferral

arXiv:2605.31512v1 Announce Type: new Abstract: Multilingual orthopedic decision support remains challenging in low-resource healthcare settings, where clinical narratives contain specialized terminology, mixed scripts, incomplete evidence, label imbalance and language-dependent documentation patterns. This article presents a reliability-oriented framework for classifying free-text orthopedic notes in English, Hindi and Punjabi. We compare task-aligned multilingual transformer encoders, a...

arXiv CS 9d ago

The ACUTE Protocol: Operationalizing Language Model Activations for Better Calibration, Utility, and Trust

new Abstract: As language models improve and become increasingly deployed to solve a variety of tasks, trustworthiness becomes essential. Calibration is a good proxy for trust: well-calibrated confidence estimates help inform the risk versus reward tradeoff when trusting a specific model output. Unfortunately, even as models improve, they remain poorly calibrated, often biasing towards overconfidence.

arXiv CS 1d ago

Cloud-tested quantum noise model predicts superconducting qubit errors with sevenfold better accuracy

Cloud-tested quantum noise model predicts superconducting qubit errors with sevenfold better accuracy Gaby Clark Scientific Editor Robert Egan Associate Editor Researchers from the Johns Hopkins Applied Physics Laboratory (APL) in Laurel, Maryland, and Johns Hopkins University in Baltimore have developed a practical, comprehensive noise-modeling framework for a popular class of superconducting quantum processors. Their work, published in the journal PRX Quantum, offers a sevenfold...

Phys.org 1d ago

STABLEVAL: Disagreement-Aware and Stable Evaluation of AI Systems

arXiv:2605.02122v2 Announce Type: replace Abstract: Human evaluation remains the primary standard for assessing modern AI systems, yet annotator disagreement, bias, and variability make system rankings fragile under standard majority vote aggregation. Majority vote discards annotator reliability and item-level ambiguity, often yielding unstable comparisons across annotator subsets. We introduce STABLEVAL, a disagreement-aware evaluation framework that models latent item correctness and...

arXiv CS 8d ago

Prescriptive Scaling Reveals the Evolution of Language Model Capabilities

arXiv:2602.15327v2 Announce Type: replace Abstract: Machine learning model performance improvements tend to arise from competition and application. For deployment, we consider prescriptive scaling laws: given a pre-training compute budget, what downstream accuracy is attainable with contemporary post-training practice, and how stable is that mapping as the field evolves? Using large-scale observational evaluations with 5k existing and 2k newly evaluated model checkpoints spanning 2022-2026...

arXiv CS 1d ago