Muon$^2$: Boosting Muon via Adaptive Second-Moment Preconditioning

arXiv CS Tuesday 09 June 2026, 04:00 UTC By Ziyue Liu, Ruijie Zhang, Zhengyang Wang, Yequan Zhao, Yupeng Su, Zi Yang, Zheng Zhang 1 min read

Key Points

arXiv:2604.09967v2 Announce Type: replace Abstract: Muon has emerged as a promising optimizer for large-scale foundation model pre-training by exploiting the matrix structure of neural network updates through iterative orthogonalization. However, the orthogonalization quality of Muon hinges on the number of Newton--Schulz (NS) iterations performed, which poses efficiency challenges due to its non-trivial computation and communication cost. We propose Muon$^2$, an extension of Muon, to improve both quality and efficiency by applying Adam-style adaptive second-moment preconditioning before orthogonalization. Our key insight is that the core challenge of polar approximation in Muon lies in the ill-conditioned momentum matrix, of which the spectrum is substantially improved by Muon$^2$, leading to faster convergence toward a practically sufficient orthogonalization. We further characterize the practical orthogonalization quality via directional alignment, under which Muon$^2$ demonstrates dramatic improvement over Muon at each polar step. Across GPT, LLaMA, and Mixture-of-Experts pre-training experiments up to 13B parameters, Muon$^2$ (and its memory-efficient variant Muon$^2$-F that preserves most of its benefits) consistently outperforms Muon and its variants while reducing NS iterations by 40%, and saves up to 1/4 training time over Muon when achieving the same loss.

Muon (LOCATION) Newton (LOCATION) Schulz (PERSON) NS (LOCATION) Adam (PERSON) Muon$^2 (ORG) GPT (ORG)

Originally published by arXiv CS Read original →

Muon$^2$: Boosting Muon via Adaptive Second-Moment Preconditioning

Related Stories

'Voltron: Legendary Defender' turns 10 today, and we think this mecha robot reboot was just as good as 'Power Rangers' and 'Transformers'

Exclusive-GM may ditch LFP batteries for future EVs

Claude Fable won’t answer basic biology questions

Musk Stock Fans Say ‘The More, The Better’ in SpaceX IPO Frenzy