OLMoE: Open Mixture-of-Experts Language Models
by Community
We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). OLMoE-1B-7B has 7 billion (B) parameters but uses only 1B per input
OSS
OLMoE: Open Mixture-of-Experts Language Models
Added 1 June 2026
Overview
OLMoE is a fully open language model that uses a sparse Mixture-of-Experts architecture. It has 7 billion total parameters but activates only 1 billion per input token, making it efficient. The model was pretrained on 5 trillion tokens and fine-tuned into an instruct version, outperforming larger models like Llama2-13B-Chat and DeepSeekMoE-16B.
Best for
Best for
Researchers and developers who need an efficient, open-source MoE language model with strong performance and full transparency.
Use cases
- Deploying efficient language models with low per-token compute cost
- Researching MoE training dynamics and expert specialization
- Building open-source applications that require state-of-the-art performance with limited resources
Notes
OLMoE is a fully open language model that uses a sparse Mixture-of-Experts architecture. It has 7 billion total parameters but activates only 1 billion per input token, making it efficient. The model was pretrained on 5 trillion tokens and fine-tuned into an instruct version, outperforming larger models like Llama2-13B-Chat and DeepSeekMoE-16B.
Use cases
- Deploying efficient language models with low per-token compute cost
- Researching MoE training dynamics and expert specialization
- Building open-source applications that require state-of-the-art performance with limited resources
Pros
- Fully open-source with model weights, training data, and code
- Outperforms larger models despite using fewer active parameters per token
- Provides detailed analysis of MoE routing and expert specialization
Cons
- Requires understanding of MoE architecture for effective deployment
- Total parameter count still 7B, which may be large for some edge devices
- Community-driven project may have less commercial support than vendor-backed models
Indexed from awesome-llm and enriched against its public facts.
Pros
- Fully open-source with model weights, training data, and code
- Outperforms larger models despite using fewer active parameters per token
- Provides detailed analysis of MoE routing and expert specialization
Cons
- Requires understanding of MoE architecture for effective deployment
- Total parameter count still 7B, which may be large for some edge devices
- Community-driven project may have less commercial support than vendor-backed models
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.
PyTorch
Community
Tensors and Dynamic neural networks in Python with strong GPU acceleration
DeepSpeed
Community
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
vLLM
Community
A high-throughput and memory-efficient inference and serving engine for LLMs
ollama
Community
Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.