Home Knowledge Base Newton-Schulz

Newton-Schulz

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

DeMuon: A Decentralized Muon for Matrix Optimization over Graphs

arXiv:2510.01377v2 Announce Type: replace-cross Abstract: In this paper, we propose DeMuon, a method for decentralized matrix optimization over a given communication topology. DeMuon incorporates matrix orthogonalization via Newton-Schulz iterations-a technique inherited from its centralized predecessor, Muon-and employs gradient tracking to mitigate heterogeneity among local functions. Under heavy-tailed noise conditions and additional mild assumptions, we establish the iteration complexity...

arXiv CS 7d ago

LiMuon: Light and Fast Muon Optimizer for Large Models

arXiv:2509.14562v3 Announce Type: replace Abstract: Large models recently are widely applied in machine learning, so efficient training of large models has received widespread attention. More recently, the useful Muon optimizer is specifically designed for matrix-structured parameters of large models. Although some works have begun to study the Muon optimizer, the existing Muon and its variants still suffer from high sample complexity or high memory for large models.

arXiv CS 9d ago

LiMuon: Light and Fast Muon Optimizer for Large Models

arXiv:2509.14562v4 Announce Type: replace Abstract: Large models recently are widely applied in machine learning, so efficient training of large models has received widespread attention. More recently, the useful Muon optimizer is specifically designed for matrix-structured parameters of large models. Although some works have begun to study the Muon optimizer, the existing Muon and its variants still suffer from high sample complexity or high memory for large models.

arXiv CS 1d ago