Home Knowledge Base Performative Stability

Performative Stability

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

SpanNorm: Reconciling Training Stability and Performance in Deep Transformers

Announce Type: replace Abstract: The success of Large Language Models (LLMs) hinges on the stable training of deep Transformer architectures. A critical design choice is the placement of normalization layers, leading to a fundamental trade-off: the ``PreNorm'' architecture ensures training stability at the cost of potential performance degradation in deep models, while the ``PostNorm'' architecture offers strong performance but suffers from severe training instability. In this work, we...

arXiv CS 5d ago

Improving the Performance and Learning Stability of Parallelizable RNNs Designed for Ultra-Low Power Applications

Announce Type: replace Abstract: Sequence learning is dominated by Transformers and parallelizable recurrent neural networks (RNNs) such as state-space models, yet learning long-term dependencies remains challenging, and state-of-the-art designs trade power consumption for performance. The Bistable Memory Recurrent Unit (BMRU) was introduced to enable hardware-software co-design of ultra-low power RNNs: quantized states with hysteresis provide persistent memory while mapping directly to...

arXiv CS 1d ago

The Stability of Online Algorithms in Performative Prediction

Announce Type: replace Abstract: The use of algorithmic predictions in decision-making leads to a feedback loop where the models we deploy actively influence the data distributions we see, and later use to retrain on. This dynamic was formalized by Perdomo et al. 2020 in their work on performative prediction. Our main result is an unconditional reduction showing that any no-regret algorithm deployed in performative settings converges to a (mixed) performatively stable equilibrium: a solution...

arXiv CS 5d ago

Stabilization-Free H(curl) and H(div)-Conforming Virtual Element Method

arXiv:2501.15168v2 Announce Type: replace Abstract: Standard Virtual Element Method (VEM) requires stabilization terms that significantly affect the numerical computation performance. In this work, we propose a stabilization-free VEM for general order \(\mathbf{H}(\operatorname{\mathbf{curl}})\) and \(\mathbf{H}(\operatorname{div})\)-conforming spaces by constructing novel serendipity projectors and corresponding serendipity spaces with minimum number of DoFs. Our approach handles the full...

arXiv CS 8d ago

Intel bit off more than it could chew with 18A process node

Intel is keen to reassure investors that its troubles with the 18A manufacturing process were a one-off, and that it is better positioned to capitalize on what it expects will be growing demand for CPUs used in AI inference workloads. Speaking at the Bank of America 2026 Global Technology Conference in San Francisco, Chipzilla’s chief financial officer David Zinsner claimed that the firm simply bit off more than it could chew in trying to move too fast with the new process node. “I would say...

The Register 7d ago

DKEKAN: A single-parameterized KAN surrogate for Drift Kinetic Equation Toward Fast Neoclassical Toroidal Viscosity Torque Modeling in Tokamaks

arXiv:2606.10310v1 Announce Type: new Abstract: The neoclassical toroidal viscosity (NTV) torque is a critical driver of toroidal rotation in tokamaks, profoundly influencing plasma stability and performance. Consequently, incorporating NTV effects is essential for modern integrated modeling frameworks that aim to self-consistently unify multiple physical processes.

arXiv Physics 17h ago

Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model

Announce Type: replace Abstract: Reinforcement learning (RL) has become essential for post-training large language models (LLMs) in reasoning tasks. While scaling rollouts can stabilize training and enhance performance, the computational overhead is a critical issue. In algorithms like GRPO, multiple rollouts per prompt incur prohibitive costs, as a large portion of prompts provide negligible gradients and are thus of low utility.

arXiv CS 1d ago

Re-Evaluating Continual Learning with Few-Shot Adaptation

arXiv:2606.03843v1 Announce Type: new Abstract: Continual learning methods aim to maximize the stability and plasticity of machine learning models that are trained on a sequence of tasks. The standard measure of stability (i.e., forgetting) is the 0-shot performance of a model on previously learned tasks, and plasticity, the performance on the most recently learned task. However, 0-shot evaluation does not fully measure a model or method's ability to retain learned information or adapt...

arXiv CS 7d ago

Online Learning for Supervisory Switching Control

arXiv:2603.14762v3 Announce Type: replace-cross Abstract: We study supervisory switching control for partially-observed linear dynamical systems. The objective is to identify and deploy a suitable controller for the unknown system by periodically selecting among a collection of $N$ candidate controllers, some of which may destabilize the underlying system. While classical estimator-based supervisory control guarantees asymptotic stability, it lacks quantitative finite-time performance bounds.

arXiv CS 1d ago

Modeling AdaGrad, RMSProp, and Adam with Integro-Differential Equations

Announce Type: replace Abstract: In this paper, we propose a continuous-time formulation for the AdaGrad, RMSProp, and Adam optimization algorithms by modeling them as first-order integro-differential equations. We perform numerical simulations of these equations, along with stability and convergence analyses, to demonstrate their validity as accurate approximations of the original algorithms. Our results indicate a strong agreement between the behavior of the continuous-time models and the...

arXiv CS 2d ago