Moonlight-A3B
by Community
Moonshot's Compute-efficient MoE LLM, first Scaling Up of Muon Optimizer
OSS
Moonlight-A3B
Added 1 June 2026
Overview
Moonlight-A3B is an open-source Mixture-of-Experts (MoE) large language model developed by Moonshot AI. It is designed for compute efficiency and is the first model to scale up the Muon optimizer for training. The model activates only a subset of parameters per token to reduce computational cost.
Best for
Best for
Developers exploring efficient MoE language models or the Muon optimizer at scale
Use cases
- Fine-tuning for domain-specific text generation tasks
- Deploying cost-effective inference with MoE architectures
- Researching Muon optimizer scaling behavior at scale
Notes
Moonlight-A3B is an open-source Mixture-of-Experts (MoE) large language model developed by Moonshot AI. It is designed for compute efficiency and is the first model to scale up the Muon optimizer for training. The model activates only a subset of parameters per token to reduce computational cost.
Use cases
- Fine-tuning for domain-specific text generation tasks
- Deploying cost-effective inference with MoE architectures
- Researching Muon optimizer scaling behavior at scale
Pros
- Compute-efficient due to Mixture-of-Experts design
- First known scaling of Muon optimizer to a large language model
- Open-source and accessible on Hugging Face
Cons
- MoE inference may require specialized batching or hardware
- Limited community adoption and documentation as a new model
- Muon optimizer compatibility with existing training pipelines may be untested
Indexed from awesome-llm and enriched against its public facts.
Pros
- Compute-efficient due to Mixture-of-Experts design
- First known scaling of Muon optimizer to a large language model
- Open-source and accessible on Hugging Face
Cons
- MoE inference may require specialized batching or hardware
- Limited community adoption and documentation as a new model
- Muon optimizer compatibility with existing training pipelines may be untested
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.
vLLM
Community
A high-throughput and memory-efficient inference and serving engine for LLMs
ollama
Community
Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.
llama.cpp
Community
LLM inference in C/C++