Home Knowledge Base The Geometry of Grokking: Norm Minimization

The Geometry of Grokking: Norm Minimization

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

The Geometry of Grokking: Norm Minimization on the Zero-Loss Manifold

arXiv:2511.01938v3 Announce Type: replace Abstract: Grokking is a puzzling phenomenon in neural networks where full generalization occurs only after a substantial delay following the complete memorization of the training data. Previous research has linked this delayed generalization to representation learning driven by weight decay, but the precise underlying dynamics remain elusive. In this paper, we argue that post-memorization learning can be understood through the lens of constrained...

arXiv CS 8d ago