Performative Stability
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
SpanNorm: Reconciling Training Stability and Performance in Deep Transformers
Announce Type: replace Abstract: The success of Large Language Models (LLMs) hinges on the stable training of deep Transformer architectures. A critical design choice is the placement of normalization layers, leading to a fundamental trade-off: the ``PreNorm'' architecture ensures training stability at the cost of potential performance degradation in deep models, while the ``PostNorm'' architecture offers strong performance but suffers from severe training instability. In this work, we...
Improving the Performance and Learning Stability of Parallelizable RNNs Designed for Ultra-Low Power Applications
Announce Type: replace Abstract: Sequence learning is dominated by Transformers and parallelizable recurrent neural networks (RNNs) such as state-space models, yet learning long-term dependencies remains challenging, and state-of-the-art designs trade power consumption for performance. The Bistable Memory Recurrent Unit (BMRU) was introduced to enable hardware-software co-design of ultra-low power RNNs: quantized states with hysteresis provide persistent memory while mapping directly to...
The Stability of Online Algorithms in Performative Prediction
Announce Type: replace Abstract: The use of algorithmic predictions in decision-making leads to a feedback loop where the models we deploy actively influence the data distributions we see, and later use to retrain on. This dynamic was formalized by Perdomo et al. 2020 in their work on performative prediction. Our main result is an unconditional reduction showing that any no-regret algorithm deployed in performative settings converges to a (mixed) performatively stable equilibrium: a solution...
Stabilization-Free H(curl) and H(div)-Conforming Virtual Element Method
arXiv:2501.15168v2 Announce Type: replace Abstract: Standard Virtual Element Method (VEM) requires stabilization terms that significantly affect the numerical computation performance. In this work, we propose a stabilization-free VEM for general order \(\mathbf{H}(\operatorname{\mathbf{curl}})\) and \(\mathbf{H}(\operatorname{div})\)-conforming spaces by constructing novel serendipity projectors and corresponding serendipity spaces with minimum number of DoFs. Our approach handles the full...
Intel bit off more than it could chew with 18A process node
Intel is keen to reassure investors that its troubles with the 18A manufacturing process were a one-off, and that it is better positioned to capitalize on what it expects will be growing demand for CPUs used in AI inference workloads. Speaking at the Bank of America 2026 Global Technology Conference in San Francisco, Chipzilla’s chief financial officer David Zinsner claimed that the firm simply bit off more than it could chew in trying to move too fast with the new process node. “I would say...
DKEKAN: A single-parameterized KAN surrogate for Drift Kinetic Equation Toward Fast Neoclassical Toroidal Viscosity Torque Modeling in Tokamaks
arXiv:2606.10310v1 Announce Type: new Abstract: The neoclassical toroidal viscosity (NTV) torque is a critical driver of toroidal rotation in tokamaks, profoundly influencing plasma stability and performance. Consequently, incorporating NTV effects is essential for modern integrated modeling frameworks that aim to self-consistently unify multiple physical processes.
Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model
Announce Type: replace Abstract: Reinforcement learning (RL) has become essential for post-training large language models (LLMs) in reasoning tasks. While scaling rollouts can stabilize training and enhance performance, the computational overhead is a critical issue. In algorithms like GRPO, multiple rollouts per prompt incur prohibitive costs, as a large portion of prompts provide negligible gradients and are thus of low utility.
Re-Evaluating Continual Learning with Few-Shot Adaptation
arXiv:2606.03843v1 Announce Type: new Abstract: Continual learning methods aim to maximize the stability and plasticity of machine learning models that are trained on a sequence of tasks. The standard measure of stability (i.e., forgetting) is the 0-shot performance of a model on previously learned tasks, and plasticity, the performance on the most recently learned task. However, 0-shot evaluation does not fully measure a model or method's ability to retain learned information or adapt...
Online Learning for Supervisory Switching Control
arXiv:2603.14762v3 Announce Type: replace-cross Abstract: We study supervisory switching control for partially-observed linear dynamical systems. The objective is to identify and deploy a suitable controller for the unknown system by periodically selecting among a collection of $N$ candidate controllers, some of which may destabilize the underlying system. While classical estimator-based supervisory control guarantees asymptotic stability, it lacks quantitative finite-time performance bounds.
Modeling AdaGrad, RMSProp, and Adam with Integro-Differential Equations
Announce Type: replace Abstract: In this paper, we propose a continuous-time formulation for the AdaGrad, RMSProp, and Adam optimization algorithms by modeling them as first-order integro-differential equations. We perform numerical simulations of these equations, along with stability and convergence analyses, to demonstrate their validity as accurate approximations of the original algorithms. Our results indicate a strong agreement between the behavior of the continuous-time models and the...