Muon+: Towards Better Muon via One Additional Normalization Step
Abstract
Muon+ enhances the Muon optimizer with additional normalization after orthogonalization, demonstrating improved training efficiency and reduced perplexity across various model sizes and architectures.
The Muon optimizer has demonstrated promising performance in pre-training large language models through gradient (or momentum) orthogonalization. In this work, we propose a simple yet effective enhancement to Muon, namely Muon+, which introduces an additional normalization step after orthogonalization. We demonstrate the effectiveness of Muon+ through extensive pre-training experiments across a wide range of model scales and architectures. Our evaluation includes GPT-style models ranging from 130M to 774M parameters and LLaMA-style models ranging from 60M to 1B parameters. We comprehensively evaluate the effectiveness of Muon+ in the compute-optimal training regime and further extend the token-to-parameter (T2P) ratio to an industrial level of approx 200. Experimental results show that Muon+ provides a consistent boost on training and validation perplexity over Muon. We provide our code here: https://github.com/K1seki221/MuonPlus.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Muon$^2$: Boosting Muon via Adaptive Second-Moment Preconditioning (2026)
- Muown: Row-Norm Control for Muon Optimization (2026)
- MuonEq: Balancing Before Orthogonalization with Lightweight Equilibration (2026)
- Nora: Normalized Orthogonal Row Alignment for Scalable Matrix Optimizer (2026)
- Spectral Flattening Is All Muon Needs: How Orthogonalization Controls Learning Rate and Convergence (2026)
- MuonQ: Enhancing Low-Bit Muon Quantization via Directional Fidelity Optimization (2026)
- When and Why Grouping Attention Heads Accelerates Muon Optimization (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2602.21545 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper