Lipschitz Hessians
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Sharp First-Order Lower Bounds for Higher-Order Smooth Nonconvex Optimization
arXiv:2606.05438v1 Announce Type: new Abstract: We study the deterministic first-order oracle complexity of finding \(\epsilon\)-stationary points in smooth nonconvex optimization when the objective satisfies higher-order smoothness assumptions. While the classical \(\epsilon^{-2}\) rate is optimal under only Lipschitz gradients, higher-order smoothness leads to accelerated first-order upper bounds, most notably the \(\epsilon^{-7/4}\) rate under Lipschitz Hessians and the...
Efficiently Escaping Saddle Points under Generalized Smoothness via Self-Bounding Regularity
Announce Type: replace-cross Abstract: We study the optimization of non-convex functions that are not necessarily smooth (gradient and/or Hessian are Lipschitz) using first order methods. Smoothness is a restrictive assumption in machine learning in both theory and practice, motivating significant recent work on finding first order stationary points of functions satisfying generalizations of smoothness with first order methods. We develop a novel framework that lets us systematically study...
Improved Guarantees for Langevin Monte Carlo with Average Smoothness
arXiv:2605.31413v1 Announce Type: cross Abstract: We establish improved nonasymptotic bounds for Langevin Monte Carlo in the strongly log-concave setting, when the error is measured by the Wasserstein distance. The main result shows that the discretization error is governed by an average coordinate-wise smoothness constant, rather than by the usual global smoothness constant. The proof is short and probabilistic, and relies on a refined use of the synchronous coupling.
Scale-Invariant Neural Network Optimization: Norm Geometry and Heavy-Tailed Noise
arXiv:2605.18528v2 Announce Type: replace-cross Abstract: A growing lesson from neural network optimization is that optimizer design should respect how the model is parametrized. Scale-invariant methods become important because their normalized layerwise updates can not only support hyperparameter transfer across model sizes but exploit input-output matrix norm geometry. At the same time, stochastic gradient noises in deep learning are often far from sub-Gaussian and may exhibit heavy tails.
Variable-preconditioned transformed primal-dual method for generalized Wasserstein Gradient Flows
arXiv:2509.15385v3 Announce Type: replace Abstract: We propose a Variable-Preconditioned Transformed Primal-Dual (VPTPD) method for solving generalized Wasserstein gradient flows based on the structure-preserving JKO scheme. This is a nontrivial extension of the TPD method [Chen et al. incorporating proximal splitting techniques to address the challenges arising from the nonsmoothness of the objective function.
Flatland: The Adventures of Gradient Descent with Large Step Sizes
Announce Type: new Abstract: The training of neural networks often entails objective functions that are not globally $L$-smooth. For these functions, it is both theoretically and practically difficult to reply to the question: what is the largest possible step size that ensures the convergence of gradient descent (GD)? We address this longstanding open question in deep learning by providing a unifying definition of "large" step sizes that requires only local Lipschitz (or even H\"older)...
Mirror Descent Under Generalized Smoothness
arXiv:2502.00753v4 Announce Type: replace-cross Abstract: Smoothness is crucial for attaining fast rates in first-order optimization. However, many optimization problems in modern machine learning involve non-smooth objectives. Recent studies relax the smoothness assumption by allowing the Lipschitz constant of the gradient to grow with respect to the gradient norm, which accommodates a broad range of objectives in practice.