Convergence Bound
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Convergence Bound and Critical Batch Size of Muon Optimizer
arXiv:2507.01598v5 Announce Type: replace Abstract: Muon, a recently proposed optimizer that leverages the inherent matrix structure of neural network parameters, has demonstrated strong empirical performance, indicating its potential as a successor to standard optimizers such as AdamW. This paper presents theoretical analysis to support its practical success. We provide convergence proofs for Muon across four practical settings, systematically examining its behavior with and without the...
Token Sample Complexity of Attention
arXiv:2512.10656v3 Announce Type: replace Abstract: As context windows in large language models continue to expand, it is essential to characterize how attention behaves at extreme sequence lengths. We introduce token sample complexity: the rate at which attention computed on $n$ tokens converges to its infinite-token limit. We estimate finite-$n$ convergence bounds at two levels: pointwise uniform convergence of the attention map, and convergence of moments for the transformed token...
Residual-based Kaczmarz methods for tensor linear equations with t-product
arXiv:2606.06057v1 Announce Type: new Abstract: Tensor linear systems widely arise from high-dimensional data mining and computing, for instance, natural language processing and machine learning. A class of residual-based tensor Kaczmarz method is proposed for tensor linear equations with t-product. Theoretical analyses prove the convergence and give an upper bound of the convergence rate of the proposed method.
Proper Calibeating
arXiv:2605.26703v2 Announce Type: replace-cross Abstract: The classic concept of "calibrated forecasts" and its more recent refinement, "calibeating," are defined with respect to the standard quadratic scoring rule. We extend these notions to the class of $\textit{proper}$ scoring rules (for which the best forecast is the true distribution) and define $\textit{proper-calibration}$ and $\textit{proper-calibeating}$ by requiring the errors to converge to zero uniformly over all bounded proper...
In-Expectation Convergence of Stochastic Gradient Methods under Heavy-Tailed Noise
Announce Type: cross Abstract: Many stochastic gradient methods are believed not to converge when the noise in stochastic gradients has only a finite $p$-th moment for $p\in\left(1,2\right)$, a setting known as the heavy-tailed noise assumption. However, some recent studies have found that Stochastic Gradient Descent ($\textsf{SGD}$), without any modification to its update rule, can surprisingly converge in expectation for convex problems with bounded domains, highlighting the potential of...
Variable-preconditioned transformed primal-dual method for generalized Wasserstein Gradient Flows
arXiv:2509.15385v3 Announce Type: replace Abstract: We propose a Variable-Preconditioned Transformed Primal-Dual (VPTPD) method for solving generalized Wasserstein gradient flows based on the structure-preserving JKO scheme. This is a nontrivial extension of the TPD method [Chen et al. incorporating proximal splitting techniques to address the challenges arising from the nonsmoothness of the objective function.
Optimality of quasi-Monte Carlo methods and suboptimality of the sparse-grid Gauss--Hermite rule in Gaussian Sobolev spaces
Announce Type: replace Abstract: Optimality of several quasi-Monte Carlo methods and suboptimality of the sparse-grid quadrature based on the univariate Gauss--Hermite rule is proved in the Sobolev spaces of mixed dominating smoothness of order $\alpha$, where the optimality is in the sense of worst-case convergence rate. For sparse-grid Gauss--Hermite quadrature, lower and upper bounds are established, with rates coinciding up to a logarithmic factor. The dominant rate is found to be only...
Approximations and Learning for Continuous State and Action MDPs under Average Cost Criteria
Announce Type: replace-cross Abstract: In this paper, for Markov Decision Processes (MDPs) with standard Borel spaces, (i) we first provide a discretization based approximation method for MDPs with continuous spaces under average cost criteria, and provide error bounds for approximations when the dynamics are only weakly continuous (for asymptotic convergence of errors as the grid sizes vanish) or Wasserstein continuous (with a rate in approximation as the grid sizes vanish) under certain...
SHAP-Guided Kernel Actor-Critic for Explainable Reinforcement Learning
arXiv:2512.05291v3 Announce Type: replace Abstract: Actor-critic (AC) methods are a cornerstone of reinforcement learning (RL) but offer limited interpretability. Current explainable RL methods seldom use state attributions to assist training. Rather, they treat all state features equally, thereby neglecting the heterogeneous impacts of individual state dimensions on the reward.
Quantum Reservoir Computing and Risk Bounds
arXiv:2501.08640v2 Announce Type: replace Abstract: We propose a way to bound the generalisation errors of several classes of quantum reservoirs using the Rademacher complexity. We give specific, parameter-dependent bounds for two particular quantum reservoir classes. We analyse how the generalisation bounds scale with growing numbers of qubits.