Home Knowledge Base the Variability of AI

the Variability of AI

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

The Token Not Taken: Sampling, State, and the Variability of AI Agent Outputs

arXiv:2606.08998v1 Announce Type: new Abstract: Agentic AI systems can behave differently across runs: the same request may produce a different plan, a different tool call, a different code edit, or a different final answer. Such variability arises from several layers that are often conflated. A foundation model is a large pretrained model, usually adaptable to many downstream tasks, that maps an input context to predictions over outputs.

arXiv CS 1d ago

J-RAS: Mutual Adaptation for Medical Image Segmentation via Contrastive Retrieval-Augmented Joint Optimization

Announce Type: replace Abstract: Manual medical image segmentation by clinicians, though accurate, is time-consuming and variable across experts, whereas AI-based models automate this process but often underperform with limited data and domain shifts. Inspired by how pathology trainees acquire disease recognition skills through guided comparison with expert-annotated slides and histopathology atlas reference images, we propose Joint Retrieval-Augmented Segmentation (J-RAS). This framework...

arXiv CS 6d ago

STABLEVAL: Disagreement-Aware and Stable Evaluation of AI Systems

arXiv:2605.02122v2 Announce Type: replace Abstract: Human evaluation remains the primary standard for assessing modern AI systems, yet annotator disagreement, bias, and variability make system rankings fragile under standard majority vote aggregation. Majority vote discards annotator reliability and item-level ambiguity, often yielding unstable comparisons across annotator subsets. We introduce STABLEVAL, a disagreement-aware evaluation framework that models latent item correctness and...

arXiv CS 8d ago

CASCADE Conformal Prediction: Uncertainty-Adaptive Prediction Intervals for Two-Stage Clinical Decision Support

arXiv:2605.20468v2 Announce Type: replace Abstract: Effective medication management in Parkinson's Disease (PD) is challenging due to heterogeneous disease progression, variable patient response, and medication side effects. While AI models can forecast levodopa equivalent daily dose (LEDD) as a measure of medication needs, standard uncertainty quantification often fails to communicate the reliability of these predictions, treating high and low confidence clinical decisions identically. We...

arXiv CS 6d ago

Too much hype? Research explores the best language to use for successful crowdfunding

Research explores the best language to use for successful crowdfunding Lisa Lock Scientific Editor Andrew Zinin Lead Editor Entrepreneurs use a variety of strategies to achieve their goals, sometimes turning to online crowdfunding campaigns to increase their reach and raise money. Yet the success of fundraising campaigns is often variable, driven by what's being asked and the language that people decide to use in their campaigns. What makes some crowdfunding campaigns successful, and others...

Phys.org 6d ago

Nigel Farage forced to address bizarre AI ads showing gun toting Reform leader fighting Bank of England boss

Nigel Farage forced to address bizarre AI ads showing gun toting Reform leader fighting Bank of England boss Strange AI deepfakes of Nigel Farage posted on social media show the Reform UK leaders brandishing a gun and physically confronting the Governor of the Bank of England Nigel Farage and Andrew Bailey have been forced to speak out after they were featured in a series of bizarre AI adverts showing the Reform and Bank of England (BoE) leaders deepfaked into an ultra-violent confrontation....

Daily Mirror 1d ago

dashi: A Python library for Dataset Shift Characterization to Support Trustworthy AI Development and Deployment

arXiv:2605.31360v1 Announce Type: new Abstract: The Artificial Intelligence (AI) life cycle requires a thorough understanding of the underlying data dynamics for robust, safe and cost-effective AI development and use. Dataset shifts are defined as changes between train and test data distributions. Whether occurring over time (temporal) or across different sites (multi-source), they can severely degrade model performance and compromise data quality.

arXiv CS 9d ago

Cross-Prompt Generalization in Detecting AI-Generated Fake News Using Interpretable Linguistic Features

new Abstract: The increasing use of large language models has raised concerns about the spread of AI-generated fake news, particularly under varying prompting strategies. Most existing detection models are trained and evaluated under a single generation setting, leaving their ability to generalize across unseen prompts unclear. In this study, we investigate cross-prompt generalization in fake news detection using three datasets of AI-generated articles produced under distinct prompts,...

arXiv CS 6d ago