Home › Knowledge Base › Improving Model Performance Under Distribution Shift

Improving Model Performance Under Distribution Shift

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Entropic Projection Alignment: Estimating, Explaining, and Improving Model Performance Under Distribution Shift

Announce Type: cross Abstract: We propose a unified framework for addressing three key challenges of distribution shift: (1) estimating a model's performance on an unlabeled target domain, (2) explaining the shift by identifying the features responsible, and (3) improving the target domain performance. Our method, Entropic Projection Alignment (EPA), aligns the source distribution to the target by matching carefully selected moments while simultaneously minimising the KL divergence from the...

arXiv CS 9d ago

Target-Agnostic Calibration under Distribution Shift with Frequency-Aware Gradient Rectification

arXiv:2508.19830v2 Announce Type: replace Abstract: Real-world model deployments inevitably encounter distribution shifts, rendering the confidence estimates of deep neural networks highly unreliable, posing severe risks in safety-critical applications. Existing methods improve calibration via training-time regularization or post-hoc adjustment, but often rely on access to (or simulation of) target domains, limiting practicality. We propose Frequency-aware Gradient Rectification (FGR), a...

arXiv CS 9d ago

ADAPTOOD: Uncertainty-Aware Fine-Tuning for Out-of-Distribution ECG Time Series Models

Announce Type: new Abstract: Data samples used for training often differ from those encountered during fine-tuning and deployment, and while ML models show promise, their performance remains limited when only small annotated datasets are available. Performance often degrades under distribution shifts caused by diverse sensors, populations, and application settings. Although pre-training helps, models frequently encounter out-of-distribution (OOD) data in real-world settings, leading to...

arXiv CS 6d ago

Bridging Domain Expertise and Generalization for Performance Estimation

arXiv:2606.06335v1 Announce Type: new Abstract: Performance estimation under distribution shift aims to predict how a model behaves on an unlabeled test set whose distribution differs from the training data, a scenario that requires reliable indicators that can faithfully reflect model behavior without ground-truth labels. Existing approaches rely solely on the outputs of the given model whose biases are amplified once the distribution shifts, weakening the correlation with the true...

arXiv CS 5d ago

FACT: A Simple and Efficient Framework for Active Finetuning

arXiv:2606.02079v1 Announce Type: new Abstract: The main goal of active finetuning is to improve a pretrained model's performance on a specific task or domain by finetuning it with carefully selected informative or challenging data. Previous research has predominantly focused on the active aspect (i.e., data selection) while uniformly employing full finetuning for model adaptation, which inevitably distorts pretrained features due to distribution shift.

arXiv CS 8d ago

Lagrangian Perturbation Diffusion Steering: Latent Reinforcement Learning for Generative Policies

Announce Type: new Abstract: Behavior cloning with high-capacity generative policies achieves strong imitation performance, but is often limited by demonstration coverage and distribution shift. Direct reinforcement learning fine-tuning can improve performance, but updating large action decoders is frequently unstable and sample inefficient. We propose Lagrangian Perturbation Diffusion Steering (LP-DS), a lightweight adaptation method that improves a frozen generative policy by learning a...

arXiv CS 8d ago

Curriculum-Adapted Robust Reinforcement Learning for UAV Deconfliction in Adversarial Environments

Announce Type: replace Abstract: Autonomous unmanned aerial vehicles (UAVs) increasingly rely on reinforcement learning (RL) for navigation. However, global navigation satellite system (GNSS) spoofing attacks can induce out-of-distribution observation shifts that corrupt value estimation and degrade mission performance. Existing robust RL approaches typically improve resilience against specific attack models but often fail to generalize to attacks not encountered during training.

arXiv CS 7d ago

CLaaS: Continual learning as a service for sample efficient online learning

arXiv:2606.05559v1 Announce Type: new Abstract: Deployed large language model agents must adapt to distribution shift in dynamic environments. Ideally, adaptation can be performed from accumulated agent experiences and retain prior capabilities while transferring to future tasks. However, agent actions and environmental transitions can only be sampled once per scenario, as real-world environments cannot be trivially reset.

arXiv CS 5d ago

Prescriptive Scaling Reveals the Evolution of Language Model Capabilities

arXiv:2602.15327v2 Announce Type: replace Abstract: Machine learning model performance improvements tend to arise from competition and application. For deployment, we consider prescriptive scaling laws: given a pre-training compute budget, what downstream accuracy is attainable with contemporary post-training practice, and how stable is that mapping as the field evolves? Using large-scale observational evaluations with 5k existing and 2k newly evaluated model checkpoints spanning 2022-2026...

arXiv CS 1d ago

Teach a Reward Model to Correct Itself: Reward Guided Adversarial Failure Discovery for Robust Reward Modeling

arXiv:2507.06419v3 Announce Type: replace Abstract: Reward modeling (RM), which captures human preferences to align large language models (LLMs), is increasingly employed in tasks such as model finetuning, response filtering, and ranking. However, due to the inherent complexity of human preferences and the limited coverage of available datasets, reward models often fail under distributional shifts or adversarial perturbations. Existing approaches for identifying such failure modes typically...

arXiv CS 2d ago