O Open Source Frameworks medium

DeepSeek-v2-236B-MoE

by Community

We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of whic

Visit Community View repo Submit your build →

OSS

Added 1 June 2026

Overview

DeepSeek-V2 is a Mixture-of-Experts language model with 236B total parameters, activating 21B per token. It uses Multi-head Latent Attention to compress the KV cache into a latent vector for efficient inference, and DeepSeekMoE for economical training via sparse computation. It supports a context length of 128K tokens.

Best for

Best for
Developers needing a cost-efficient large language model with long context and sparse activation

Use cases

Running large-scale language model inference with reduced memory footprint
Training large models with lower computational cost via sparse activation
Handling long-context tasks up to 128K tokens

Notes

Use cases

Running large-scale language model inference with reduced memory footprint
Training large models with lower computational cost via sparse activation
Handling long-context tasks up to 128K tokens

Pros

Efficient inference due to KV cache compression with Multi-head Latent Attention
Economical training through sparse Mixture-of-Experts (only 21B activated per token)
Supports very long context length of 128K tokens

Cons

Large total parameter count (236B) requires substantial hardware for full model storage
Community model may lack commercial support or polished documentation
MoE architectures can introduce load balancing challenges and inference complexity

Indexed from awesome-llm and enriched against its public facts.

Pros

Efficient inference due to KV cache compression with Multi-head Latent Attention
Economical training through sparse Mixture-of-Experts (only 21B activated per token)
Supports very long context length of 128K tokens

Cons

Large total parameter count (236B) requires substantial hardware for full model storage
Community model may lack commercial support or polished documentation
MoE architectures can introduce load balancing challenges and inference complexity

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Built with2entries

O OSS Obs medium

PyTorch

Community

Tensors and Dynamic neural networks in Python with strong GPU acceleration

★ 100,318 updated 23d ago

O OSS Framework medium

DeepSpeed

Community

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

★ 42,436 updated 23d ago

Pairs with3entries

O OSS Framework medium

vLLM

Community

A high-throughput and memory-efficient inference and serving engine for LLMs

★ 81,619 updated 23d ago

O OSS Framework medium

ollama

Community

Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.

★ 172,846 updated 23d ago

O OSS Framework medium

DeepSeek-R1

Community

First-generation reasoning models from DeepSeek.

★ 92,010 updated 12mo ago

← Back to Open Source Submit your own entry →