O Open Source Frameworks medium

DeepSeek-V3 Technical Report

by Community

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-eff

Visit Community View repo Submit your build →

OSS

Added 1 June 2026

Overview

DeepSeek-V3 is a Mixture-of-Experts framework with 671B total parameters and 37B activated per token. It uses Multi-head Latent Attention and DeepSeekMoE architectures, and introduces auxiliary-loss-free load balancing and multi-token prediction training. The model is pre-trained on 14.8 trillion tokens followed by supervised fine-tuning.

Best for

Best for
Large-scale language model researchers and engineers working on MoE frameworks.

Use cases

Researching efficient MoE architectures for large language models
Implementing load balancing strategies without auxiliary losses
Applying multi-token prediction training to improve model performance

Notes

Use cases

Researching efficient MoE architectures for large language models
Implementing load balancing strategies without auxiliary losses
Applying multi-token prediction training to improve model performance

Pros

Activates only 37B parameters per token for efficient inference
Novel auxiliary-loss-free load balancing simplifies training
Strong performance from training on 14.8 trillion high-quality tokens

Cons

Very large total parameter count (671B) demands significant hardware resources
Technical report may lack accessible implementation details and code
Pre-training on 14.8T tokens is extremely resource and time intensive

Indexed from awesome-llm and enriched against its public facts.

Pros

Activates only 37B parameters per token for efficient inference
Novel auxiliary-loss-free load balancing simplifies training
Strong performance from training on 14.8 trillion high-quality tokens

Cons

Very large total parameter count (671B) demands significant hardware resources
Technical report may lack accessible implementation details and code
Pre-training on 14.8T tokens is extremely resource and time intensive

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Pairs with1entry

O OSS Framework medium

DeepSeek-R1

Community

First-generation reasoning models from DeepSeek.

★ 92,010 updated 12mo ago

← Back to Open Source Submit your own entry →