Home Knowledge Base Muon Learns More Robust

Muon Learns More Robust

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Muon Learns More Robust and Transferable Features than Adam

arXiv:2606.09658v1 Announce Type: new Abstract: Muon has recently emerged as a state-of-the-art optimizer for pretraining Large Language Models (LLMs) and vision classifiers. Despite its efficiency advantage over Adam and SGD, the feature-learning advantage of Muon remains unclear. This paper investigates Muon's feature-learning advantage through the lens of robustness and transferability.

arXiv CS 1d ago

When Muon Optimizer Meets Adversarial Training: A Theoretical and Empirical Study

Announce Type: replace Abstract: Adversarial training (AT) remains one of the most reliable empirical defenses against adversarial attacks. Its robustness critically depends on how the underlying min-max objective is optimized. In practice, Stochastic Gradient Descent (SGD) optimizer remains the default optimization choice for AT, whereas adaptive optimizers often improve standard training but may yield inferior robustness.

arXiv CS 9d ago