Home Knowledge Base the Uncertainty of Foundation Models

the Uncertainty of Foundation Models

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Quantifying the Uncertainty of Foundation Models with Singular Value Ensembles

arXiv:2601.22068v2 Announce Type: replace Abstract: Foundation models have become a dominant paradigm in machine learning, achieving remarkable performance across diverse tasks through large-scale pretraining. However, they often yield overconfident, uncalibrated predictions. The standard approach to quantifying epistemic uncertainty are ensembles of multiple independently trained models.

arXiv CS 9d ago

On the Uncertainty Quantification Ability of Tabular Foundation Models

Announce Type: cross Abstract: Foundation models (FMs) have achieved substantial success in generalizing across tasks without problemspecific training or fine-tuning. However, many critical applications in mechanics and computational science require not only accurate predictions but also reliable uncertainty quantification (UQ). Herein we investigate the UQ capabilities of tabular FMs in regression tasks through a comprehensive empirical study comparing Tabular Prior-Data Fitted Networks...

arXiv CS 8d ago

Beyond Point Estimates: Benchmarking Uncertainty Quantification Methods on the AION-1 Astronomical Foundation Model

arXiv:2606.07771v1 Announce Type: cross Abstract: Foundation models for astronomical surveys offer powerful learned representations that can be transferred to downstream regression tasks such as galaxy property estimation. However, point predictions alone are insufficient for scientific inference; reliable uncertainty quantification (UQ) is essential. We compare seven UQ methods on galaxy property regression using frozen AION-1 foundation-model embeddings, predicting redshift, stellar mass,...

arXiv CS 1d ago

Agentic Physical AI toward a Domain-Specific Foundation Model for Energy Systems: A Case Study on Nuclear Reactor Control

arXiv:2512.23292v4 Announce Type: replace Abstract: The prevailing paradigm in AI for physical systems: scaling general-purpose foundation models toward universal multimodal reasoning, confronts a barrier at the control interface. Frontier vision-language models achieve only 50-53% accuracy on basic quantitative physics tasks, behaving as approximate guessers that preserve semantic plausibility while violating physical constraints. Safety-critical control demands outcome-space guarantees...

arXiv CS 9d ago

Agentic Physical AI toward a Domain-Specific Foundation Model for Energy Systems: A Case Study on Nuclear Reactor Control

arXiv:2512.23292v5 Announce Type: replace Abstract: The prevailing paradigm in AI for physical systems: scaling general-purpose foundation models toward universal multimodal reasoning, confronts a barrier at the control interface. Frontier vision-language models achieve only 50-53% accuracy on basic quantitative physics tasks, behaving as approximate guessers that preserve semantic plausibility while violating physical constraints. Safety-critical control demands outcome-space guarantees...

arXiv CS 2d ago

Multi-Task Crack Foundation Model for Engineering-Reliable Crack Representation and Topology Preservation in Civil Infrastructure

arXiv:2606.05641v1 Announce Type: new Abstract: Reliable crack assessment requires not only accurate pixel-level masks but also connected crack geometry and confidence estimates that remain stable under domain shift. However, existing segmentation models can achieve high overlap scores while fragmenting cracks, missing fine branches, and providing no calibrated uncertainty. To address this gap, this paper proposes CrackGeoFM, a multi-task framework that combines a frozen visual foundation...

arXiv CS 5d ago

Variational Routing: A Scalable Bayesian Framework for Calibrated Mixture-of-Experts Transformers

Announce Type: replace Abstract: Foundation models are increasingly being deployed in contexts where understanding the uncertainty of their outputs is critical to ensuring responsible deployment. While Bayesian methods offer a principled approach to uncertainty quantification, their computational overhead renders their use impractical for training or inference at foundation model scale. State-of-the-art models achieve parameter counts in the trillions through carefully engineered sparsity...

arXiv CS 9d ago

Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning

Announce Type: replace Abstract: Parameter-efficient adaptation of vision-language foundation models is crucial for precise multimodal understanding of biomedical images, yet existing methods remain deterministic and often struggle under domain shift or ambiguous image-text alignment. This limitation is particularly critical in the clinic, where models should remain robust in low-data regimes and domain shifts.

arXiv CS 8d ago

Instrumented data for causal scientific machine learning

arXiv:2606.07865v1 Announce Type: cross Abstract: Scientific machine learning is limited less by model size than by the data it is trained on. Observational data records what happened but not why; template synthetic data has a known generating process but only for the simulator's template, not the case a user faces. We argue a third option is now operationally feasible: instrumented data, in which every datum carries the mechanistic model that produced it, an explicit uncertainty over that...

arXiv Physics 1d ago

Instrumented data for causal scientific machine learning

arXiv:2606.07865v1 Announce Type: new Abstract: Scientific machine learning is limited less by model size than by the data it is trained on. Observational data records what happened but not why; template synthetic data has a known generating process but only for the simulator's template, not the case a user faces. We argue a third option is now operationally feasible: instrumented data, in which every datum carries the mechanistic model that produced it, an explicit uncertainty over that...

arXiv CS 1d ago