Home › Knowledge Base › Muon

Muon

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Muon$^2$: Boosting Muon via Adaptive Second-Moment Preconditioning

arXiv:2604.09967v2 Announce Type: replace Abstract: Muon has emerged as a promising optimizer for large-scale foundation model pre-training by exploiting the matrix structure of neural network updates through iterative orthogonalization. However, the orthogonalization quality of Muon hinges on the number of Newton--Schulz (NS) iterations performed, which poses efficiency challenges due to its non-trivial computation and communication cost. We propose Muon$^2$, an extension of Muon, to...

arXiv CS 1d ago

Absolute intensity measurement of pulsed muon beams using in-beam activation

Announce Type: new Abstract: The absolute number of negative muons contained in a beam is essential for many experiments at accelerator facilities, but determining it in pulsed beams has been difficult, particularly at high intensities. The method utilizing the yield of the $\beta$ delayed $\gamma$ rays from the residual nuclei after the muon nuclear capture reaction has recently been developed to determine the muon number in the pulsed muon beam. In particular, the in-beam activation method...

arXiv Physics 12h ago

Denoise First, Orthogonalize Later: Understanding Momentum in Muon via Spectral Filtering

Announce Type: new Abstract: Muon has recently demonstrated strong empirical performance in large language model training, but the theoretical role of momentum in Muon remains unclear. Existing analyses of Muon either remove momentum to study spectral updates in isolation, or retain momentum without explaining why it improves empirical performance. Our work bridges this gap by showing momentum in Muon acts as a spectral filter.

arXiv CS 7d ago

Denoise First, Orthogonalize Later: Understanding Momentum in Muon via Spectral Filtering

Announce Type: replace Abstract: Muon has recently demonstrated strong empirical performance in large language model training, but the theoretical role of momentum in Muon remains unclear. Existing analyses of Muon either remove momentum to study spectral updates in isolation, or retain momentum without explaining why it improves empirical performance. Our work bridges this gap by showing momentum in Muon acts as a spectral filter.

arXiv CS 6d ago

Full Characterization of a Mock Nuclear Waste Barrel with Muon Tomography using Micromegas Detectors

Announce Type: new Abstract: Muon tomography based on multiple Coulomb scattering provides a non-destructive method to image dense and shielded objects using naturally occurring cosmic-ray muons. In the context of nuclear waste characterization, we present the experimental imaging of a 205-L mock waste barrel using a dedicated 1m$^2$ muon scattering tomography test bench. The system employs multiplexed resistive Micromegas detectors, enabling stable and high-precision muon tracking.

arXiv Physics 8d ago

Why Muon Outperforms Adam: A Curvature Perspective

arXiv:2606.04662v1 Announce Type: new Abstract: Muon improves training efficiency over Adam in large language-model training by about two times, but the local geometric source of this advantage remains unclear. Our work takes a first step toward demystifying Muon's superiority over Adam from a curvature perspective. First, we apply a second-order Taylor approximation to the training landscape and show that Muon achieves a larger one-step loss decrease than Adam at matched validation loss.

arXiv CS 6d ago

LiMuon: Light and Fast Muon Optimizer for Large Models

arXiv:2509.14562v4 Announce Type: replace Abstract: Large models recently are widely applied in machine learning, so efficient training of large models has received widespread attention. More recently, the useful Muon optimizer is specifically designed for matrix-structured parameters of large models. Although some works have begun to study the Muon optimizer, the existing Muon and its variants still suffer from high sample complexity or high memory for large models.

arXiv CS 1d ago

LiMuon: Light and Fast Muon Optimizer for Large Models

arXiv:2509.14562v3 Announce Type: replace Abstract: Large models recently are widely applied in machine learning, so efficient training of large models has received widespread attention. More recently, the useful Muon optimizer is specifically designed for matrix-structured parameters of large models. Although some works have begun to study the Muon optimizer, the existing Muon and its variants still suffer from high sample complexity or high memory for large models.

arXiv CS 9d ago

Muon Learns More Robust and Transferable Features than Adam

arXiv:2606.09658v1 Announce Type: new Abstract: Muon has recently emerged as a state-of-the-art optimizer for pretraining Large Language Models (LLMs) and vision classifiers. Despite its efficiency advantage over Adam and SGD, the feature-learning advantage of Muon remains unclear. This paper investigates Muon's feature-learning advantage through the lens of robustness and transferability.

arXiv CS 1d ago

Muon in Associative Memory Learning: Training Dynamics and Scaling Laws

arXiv:2602.05725v3 Announce Type: replace Abstract: Muon updates matrix parameters via the matrix sign of the gradient and has shown strong empirical gains, yet its dynamics and scaling behavior remain unclear in theory. We study Muon in a linear associative memory model with softmax retrieval and a hierarchical frequency spectrum over query-answer pairs, with and without label noise. In this setting, we show that Gradient Descent (GD) learns frequency components at highly imbalanced rates,...

arXiv CS 6d ago