Polyak
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization: Robust Performance Without Small (Sub)Gradients
arXiv:2512.02342v3 Announce Type: replace-cross Abstract: The stochastic Polyak step size (SPS) has proven to be a promising choice for stochastic gradient descent (SGD), delivering competitive performance relative to state-of-the-art methods on smooth convex and non-convex optimization problems, including deep neural network training. However, extensions of this approach to non-smooth settings remain in their early stages, often relying on interpolation assumptions or requiring knowledge of...
Adaptive Sharpness-Aware Minimization with a Polyak-type Step size: A Theory-Grounded Scheduler
arXiv:2606.01827v1 Announce Type: cross Abstract: Sharpness-Aware Minimization (SAM) has established itself as a powerful and widely adopted optimizer for training machine learning models. By explicitly minimizing the sharpness of the loss landscape, SAM often improves generalization while delivering strong empirical performance. However, SAM and its variants, like most training algorithms, are sensitive to the choice of learning rate, which is typically selected through extensive...
Fast and Robust Convergence Rate for TD(0) with Linear Function Approximation, Universal Learning Steps and I.I.D. Samples
arXiv:2606.05967v1 Announce Type: cross Abstract: In this paper, we study the finite-time behavior of the TD(0) temporal-difference method with linear function approximation (LFA). We consider on-policy independent and identically distributed (i.i.d.) samples, a constant learning step, and the Polyak-Juditsky averaging method.
Randomized Feasibility Methods for Constrained Optimization with Adaptive Step Sizes
arXiv:2601.20076v2 Announce Type: replace-cross Abstract: We consider minimizing an objective function subject to constraints defined by the intersection of lower-level sets of convex functions. We study two cases: (i) strongly convex and Lipschitz-smooth objective function and (ii) convex but possibly nonsmooth objective function. To deal with the constraints that are not easy to project on, we use a randomized feasibility algorithm with Polyak steps and a random number of sampled...
Fast and Robust Convergence Rate for TD(0) with Linear Function Approximation, Universal Learning Steps and I.I.D. Samples
arXiv:2606.05967v2 Announce Type: replace-cross Abstract: In this paper, we study the finite-time behavior of the TD(0) temporal-difference method with linear function approximation (LFA). We consider on-policy independent and identically distributed (i.i.d.) samples, a constant learning step, and the Polyak-Juditsky averaging method.
Global Convergence of Wasserstein Policy Gradient for Entropy-Regularized Reinforcement Learning
Announce Type: replace Abstract: Wasserstein policy gradient (WPG) is a policy optimization method for reinforcement learning (RL) that exploits the optimal-transport geometry of action distributions. For the entropy-regularized RL objective, WPG evolves each state-conditional policy by transporting it along the action gradient of the soft Q-function together with a Langevin-type diffusion. Despite its appeal for continuous-control problems, its global convergence properties remain poorly...
Local linear convergence of gradient methods for overparameterized Gaussian mixtures
Announce Type: new Abstract: We study the problem of learning Gaussian mixture models under overparameterization. Prior work has shown that while overparameterization is essential for avoiding spurious local optima and enables global recovery of the ground-truth model using the gradient-EM (expectation-maximization) algorithm, it can dramatically slow down the local rate of convergence. Under certain assumptions on the mixture weights, we show that a standard divergence measure minimized by...
On the sharp linear convergence rate of the circumcentered--reflection method on subspaces
arXiv:2606.07888v1 Announce Type: cross Abstract: For two subspaces $U,V\subseteq\RR^n$, the circumcentered--reflection method (CRM) of Behling, Bello-Cruz, and Santos~\cite{BBS2018} computes the projection onto $U\cap V$ using only the reflections across $U$ and $V$, with known linear-convergence rate $c_F$, the cosine of the Friedrichs angle. We prove that, when CRM is initialized in $V$, it contracts at the strictly smaller rate...