O Open Source Frameworks medium

OLMoE: Open Mixture-of-Experts Language Models

by Community

We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). OLMoE-1B-7B has 7 billion (B) parameters but uses only 1B per input

Visit Community View repo Submit your build →

OSS

Added 1 June 2026

Overview

OLMoE is a fully open language model that uses a sparse Mixture-of-Experts architecture. It has 7 billion total parameters but activates only 1 billion per input token, making it efficient. The model was pretrained on 5 trillion tokens and fine-tuned into an instruct version, outperforming larger models like Llama2-13B-Chat and DeepSeekMoE-16B.

Best for

Best for
Researchers and developers who need an efficient, open-source MoE language model with strong performance and full transparency.

Use cases

Deploying efficient language models with low per-token compute cost
Researching MoE training dynamics and expert specialization
Building open-source applications that require state-of-the-art performance with limited resources

Notes

Use cases

Deploying efficient language models with low per-token compute cost
Researching MoE training dynamics and expert specialization
Building open-source applications that require state-of-the-art performance with limited resources

Pros

Fully open-source with model weights, training data, and code
Outperforms larger models despite using fewer active parameters per token
Provides detailed analysis of MoE routing and expert specialization

Cons

Requires understanding of MoE architecture for effective deployment
Total parameter count still 7B, which may be large for some edge devices
Community-driven project may have less commercial support than vendor-backed models

Indexed from awesome-llm and enriched against its public facts.

Pros

Fully open-source with model weights, training data, and code
Outperforms larger models despite using fewer active parameters per token
Provides detailed analysis of MoE routing and expert specialization

Cons

Requires understanding of MoE architecture for effective deployment
Total parameter count still 7B, which may be large for some edge devices
Community-driven project may have less commercial support than vendor-backed models

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Uses2entries

O OSS Obs medium

PyTorch

Community

Tensors and Dynamic neural networks in Python with strong GPU acceleration

★ 100,318 updated 23d ago

O OSS Framework medium

DeepSpeed

Community

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

★ 42,436 updated 23d ago

Built with1entry

O OSS Obs medium

PyTorch

Community

Tensors and Dynamic neural networks in Python with strong GPU acceleration

★ 100,318 updated 23d ago

Pairs with2entries

O OSS Framework medium

vLLM

Community

A high-throughput and memory-efficient inference and serving engine for LLMs

★ 81,619 updated 23d ago

O OSS Framework medium

ollama

Community

Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.

★ 172,846 updated 23d ago

← Back to Open Source Submit your own entry →