Muon$^2
No mentions found
This entity hasn't been tracked yet, or Iris is still building its knowledge base.
Related Articles from SNS
Muon$^2$: Boosting Muon via Adaptive Second-Moment Preconditioning
arXiv:2604.09967v2 Announce Type: replace Abstract: Muon has emerged as a promising optimizer for large-scale foundation model pre-training by exploiting the matrix structure of neural network updates through iterative orthogonalization. However, the orthogonalization quality of Muon hinges on the number of Newton--Schulz (NS) iterations performed, which poses efficiency challenges due to its non-trivial computation and communication cost. We propose Muon$^2$, an extension of Muon, to...
Analytic Derivation of Vertical Chromaticity in the Fermilab Muon $g{-}2$ Storage Ring
arXiv:2606.09903v1 Announce Type: new Abstract: We derive the vertical chromaticity $\xi_y$ of the Fermilab Muon g-2 storage ring in closed analytic form. Expanding the Hamiltonian as a Taylor polynomial in the dynamical variables and integrating the equations of motion order by order, we obtain the vertical second-order aberrations of the homogeneous magnetic dipole ($\mathtt{DI}$) and the combined-function dipole-and-electrostatic-quadrupole element ($\mathtt{DIQ}$) used in the muon...
Full Characterization of a Mock Nuclear Waste Barrel with Muon Tomography using Micromegas Detectors
Announce Type: new Abstract: Muon tomography based on multiple Coulomb scattering provides a non-destructive method to image dense and shielded objects using naturally occurring cosmic-ray muons. In the context of nuclear waste characterization, we present the experimental imaging of a 205-L mock waste barrel using a dedicated 1m$^2$ muon scattering tomography test bench. The system employs multiplexed resistive Micromegas detectors, enabling stable and high-precision muon tracking.
When Muon Optimizer Meets Adversarial Training: A Theoretical and Empirical Study
Announce Type: replace Abstract: Adversarial training (AT) remains one of the most reliable empirical defenses against adversarial attacks. Its robustness critically depends on how the underlying min-max objective is optimized. In practice, Stochastic Gradient Descent (SGD) optimizer remains the default optimization choice for AT, whereas adaptive optimizers often improve standard training but may yield inferior robustness.
GPU optical photon Monte Carlo for noble liquid detectors: validation against Geant4 in a large liquid argon TPC benchmark
Announce Type: replace Abstract: Optical photon Monte Carlo simulation is a computational bottleneck for noble liquid Time Projection Chambers. Design studies require repeated, geometry dependent simulations of timing, wavelength shifting, and optical response, while reconstruction and particle identification workflows need labeled optical datasets. We present Simphony, a GPU optical simulation tool, formerly EIC-Opticks, built on Opticks with CUDA and NVIDIA OptiX. Simphony implements a GPU...
MuLoCo: Muon is a practical inner optimizer for DiLoCo
Announce Type: replace Abstract: DiLoCo is a powerful framework for training large language models (LLMs), enabling larger optimal batch sizes and increased accelerator utilization under networking constraints. However, DiLoCo's performance has been shown to degrade as the number of workers (K) increases (Charles et al., 2025). In this work, we posit that a related but often overlooked factor in DiLoCo's behavior is the choice of inner optimizer, which shapes the pseudogradient used by the...
Recent application studies of an INTPIX4NA SOIPIX detector-based X-ray camera using an SiTCP-XG 10GbE-based high-speed readout system at KEK facilities
arXiv:2603.09461v3 Announce Type: replace Abstract: The Silicon-On-Insulator PIXel (SOIPIX) detector is a unique monolithic structure imaging device currently being developed by the SOIPIX group, led by the High Energy Accelerator Research Organization (KEK). Our detector team at the KEK Photon Factory (PF) has developed an X-ray camera based on the INTPIX4NA SOIPIX detector. This detector provides a sensitive area of 14.1 $\times$ 8.7 $\mathrm{mm^2}$, with 425,984 pixels arranged in an...
MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second
From the first roaring racer of the combustion age to the sonic boom that shattered the sound barrier, humanity's hunger for speed is written into our very DNA. The speed of AI reasoning is no different — it defines the boundaries of intelligence itself. When a model is fast enough, it ceases to be a tool you wait on and becomes an extension of your own thinking: responding in real time, iterating in an instant, collaborating without friction.
Mellum2 Technical Report
arXiv:2605.31268v1 Announce Type: new Abstract: We present Mellum 2, an open-weight 12B-parameter Mixture-of-Experts (MoE) language model with 2.5B active parameters per token. Mellum 2 is a general-purpose language model specialized in software engineering, spanning code generation and editing, debugging, multi-step reasoning, tool use and function calling, agentic coding, and conversational programming assistance, and it is the successor to the completion-focused 4B dense Mellum model. The...
Measurement of reactor neutrino oscillation with the first JUNO data
Abstract Neutrino oscillations (see refs. 1,2 and references therein), a quantum effect manifesting at macroscopic scales, are governed by lepton flavour mixing angles and neutrino mass-squared differences3 that are fundamental parameters of particle physics, representing phenomena beyond the Standard Model. Precision measurements of these parameters are essential for testing the completeness of the three-flavour framework, determining the mass ordering of neutrinos and probing possible new...