Nesterov
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Inference of Online Newton Methods with Nesterov's Accelerated Sketching
Announce Type: replace-cross Abstract: Reliable decision-making with streaming data requires principled uncertainty quantification of online methods. While first-order methods enable efficient iterate updates, their inference procedures still require updating proper (covariance) matrices, incurring $O(d^2)$ time and memory complexity, and are sensitive to ill-conditioning and noise heterogeneity of the problem. This costly inference task offers an opportunity for more robust second-order...
Accelerated Multiple Wasserstein Gradient Flows for Multi-objective Distributional Optimization
arXiv:2601.19220v2 Announce Type: replace Abstract: We study multi-objective optimization over probability distributions in Wasserstein space. Recently, Nguyen et al. (2025) introduced Multiple Wasserstein Gradient Descent (MWGraD) algorithm, which exploits the geometric structure of Wasserstein space to jointly optimize multiple objectives. Building on this approach, we propose an accelerated variant, A-MWGraD, inspired by Nesterov's acceleration.
Accelerated Decentralized Stochastic Gradient Descent for Strongly Convex Optimization
Announce Type: new Abstract: Decentralized stochastic optimization is a fundamental paradigm for large-scale learning over networks, where agents communicate only with their neighbors and no central coordinator is required. For strongly convex problems, communication efficiency is mainly determined by the condition number \(\kappa=L/\mu\) and the network spectral gap \(1-\beta\). Although deterministic decentralized methods can simultaneously achieve accelerated \(\sqrt{\kappa}\) and...
Learning to optimize with guarantees: a complete characterization of linearly convergent algorithms
Announce Type: replace Abstract: The design of many classical optimization algorithms is driven by the certification of linear convergence rates over classes of optimization problems. In this paper, we consider the problem of improving the average-case performance of an algorithm over a specific distribution of problem instances. While this task can be tackled by embedding trainable components into the algorithm updates, a key challenge is to preserve worst-case guarantees across the entire...
Convergence Bound and Critical Batch Size of Muon Optimizer
arXiv:2507.01598v5 Announce Type: replace Abstract: Muon, a recently proposed optimizer that leverages the inherent matrix structure of neural network parameters, has demonstrated strong empirical performance, indicating its potential as a successor to standard optimizers such as AdamW. This paper presents theoretical analysis to support its practical success. We provide convergence proofs for Muon across four practical settings, systematically examining its behavior with and without the...
Breaking $1/\epsilon$ Barrier in Quantum Zero-Sum Games: Generalizing Metric Subregularity for Spectraplexes
arXiv:2509.21570v2 Announce Type: replace Abstract: Quantum zero-sum games provide a framework for non-local games, quantum interactive proofs, and quantum machine learning, where players optimize a bilinear payoff over quantum states. In contrast to classical bilinear games over polyhedral domains, for which gradient methods achieve linear last-iterate convergence, comparable guarantees over spectraplexes have remained open.