Home Knowledge Base Local SGD

Local SGD

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Local MixVR: Breaking the Communication-Sample Dependence in Distributed Learning

arXiv:2606.01128v1 Announce Type: new Abstract: Communication overhead is a crucial bottleneck in scalable distributed learning. While existing methods aim to efficiently utilize data points, such as Local SGD, Minibatch SGD, and their accelerated variants, they still exhibit communication-round complexity that scales with the total number of samples $N$. In this paper, we introduce Local MixVR, a distributed framework that integrates local updates with variance-reduction techniques to...

arXiv CS 8d ago

Sharp description of local minima in the loss landscape of high-dimensional two-layer ReLU neural networks

arXiv:2604.09412v2 Announce Type: replace-cross Abstract: We study the population loss landscape of two-layer ReLU networks of the form $\sum_{k=1}^K \mathrm{ReLU}(w_k^\top x)$ in a realisable teacher-student setting with Gaussian covariates. We show that local minima admit an exact low-dimensional representation in terms of summary statistics, yielding a sharp and interpretable characterisation of the landscape. We further establish a direct link with one-pass SGD: local minima correspond...

arXiv CS 9d ago

On the Superlinear Relationship between SGD Noise Covariance and Loss Landscape Curvature

Announce Type: replace Abstract: Stochastic Gradient Descent (SGD) introduces anisotropic noise that is correlated with the local curvature of the loss landscape, thereby biasing optimization toward flat minima. Prior work often assumes an equivalence between the Fisher Information Matrix and the Hessian for negative log-likelihood losses, leading to the claim that the SGD noise covariance $\mathbf{C}$ is proportional to the Hessian $\mathbf{H}$. We show that this assumption holds only under...

arXiv CS 1d ago

Ubiquity of Emergent Hebbian Dynamics in Regularized Learning

Announce Type: replace Abstract: Hebbian and anti-Hebbian plasticity are widely observed in the brain and are classically modeled as mechanistic, local homosynaptic rules stabilized by homeostatic constraints. This raises an identifiability question: does observing Hebbian/anti-Hebbian structure in synaptic updates uniquely imply an underlying Hebbian computation? We identify an alternative, emergent route.

arXiv CS 9d ago

Human-Like Neural Nets by Catapulting

Human-like Neural Nets by Catapulting Speculative proposal to create artificial neural nets with human-like performance by high-learning-rate/regularization training of overparameterized NNs to trigger catapulting/grokking. Over-parameterization as a route to true generalization would resolve many outstanding mysteries of artificial versus natural intelligence. There are many mysteries about deep learning and human intelligence, but we could describe the biggest anomaly this way: why are...

Hacker News 3d ago