Enterprise DNA
O Open Source Frameworks medium

OLMoE: Open Mixture-of-Experts Language Models

by Community

We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). OLMoE-1B-7B has 7 billion (B) parameters but uses only 1B per input

OO

OSS

OLMoE: Open Mixture-of-Experts Language Models

Added 1 June 2026

Overview

OLMoE is a fully open language model that uses a sparse Mixture-of-Experts architecture. It has 7 billion total parameters but activates only 1 billion per input token, making it efficient. The model was pretrained on 5 trillion tokens and fine-tuned into an instruct version, outperforming larger models like Llama2-13B-Chat and DeepSeekMoE-16B.

Best for

Best for
Researchers and developers who need an efficient, open-source MoE language model with strong performance and full transparency.

Use cases

  • Deploying efficient language models with low per-token compute cost
  • Researching MoE training dynamics and expert specialization
  • Building open-source applications that require state-of-the-art performance with limited resources

Notes

OLMoE is a fully open language model that uses a sparse Mixture-of-Experts architecture. It has 7 billion total parameters but activates only 1 billion per input token, making it efficient. The model was pretrained on 5 trillion tokens and fine-tuned into an instruct version, outperforming larger models like Llama2-13B-Chat and DeepSeekMoE-16B.

Use cases

  • Deploying efficient language models with low per-token compute cost
  • Researching MoE training dynamics and expert specialization
  • Building open-source applications that require state-of-the-art performance with limited resources

Pros

  • Fully open-source with model weights, training data, and code
  • Outperforms larger models despite using fewer active parameters per token
  • Provides detailed analysis of MoE routing and expert specialization

Cons

  • Requires understanding of MoE architecture for effective deployment
  • Total parameter count still 7B, which may be large for some edge devices
  • Community-driven project may have less commercial support than vendor-backed models

Indexed from awesome-llm and enriched against its public facts.

Pros

  • Fully open-source with model weights, training data, and code
  • Outperforms larger models despite using fewer active parameters per token
  • Provides detailed analysis of MoE routing and expert specialization

Cons

  • Requires understanding of MoE architecture for effective deployment
  • Total parameter count still 7B, which may be large for some edge devices
  • Community-driven project may have less commercial support than vendor-backed models