Qwen2.5-1M-7|14B
by Community
Tech Report HuggingFace ModelScope Qwen Chat HuggingFace Demo ModelScope Demo DISCORD Introduction Two months after upgrading Qwen2.5-Turbo to support context length up to one mi
OSS
Qwen2.5-1M-7|14B
Added 1 June 2026
Overview
Qwen2.5-1M-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M are open-source language models upgraded to handle up to 1 million tokens of context. They are accompanied by an inference framework designed to support this extended context length. The models are released as checkpoints and are available on HuggingFace and ModelScope.
Best for
Best for
Developers who need open-source models with very long context capabilities for document processing or code analysis
Use cases
- Analyzing and summarizing very long documents or books
- Processing large codebases for refactoring or debugging
- Handling extended multi-turn conversations with full history
Notes
Qwen2.5-1M-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M are open-source language models upgraded to handle up to 1 million tokens of context. They are accompanied by an inference framework designed to support this extended context length. The models are released as checkpoints and are available on HuggingFace and ModelScope.
Use cases
- Analyzing and summarizing very long documents or books
- Processing large codebases for refactoring or debugging
- Handling extended multi-turn conversations with full history
Pros
- Supports up to 1M tokens of context, surpassing many alternatives
- Open-source with community-accessible checkpoints
- Includes dedicated inference framework for efficient long-context usage
Cons
- Large models (7B and 14B) require significant GPU memory for inference
- Long context may lead to slower inference times compared to shorter models
- Relatively new with limited third-party tooling and optimization
Indexed from awesome-llm and enriched against its public facts.
Pros
- Supports up to 1M tokens of context, surpassing many alternatives
- Open-source with community-accessible checkpoints
- Includes dedicated inference framework for efficient long-context usage
Cons
- Large models (7B and 14B) require significant GPU memory for inference
- Long context may lead to slower inference times compared to shorter models
- Relatively new with limited third-party tooling and optimization
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.
vLLM
Community
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang
Community
SGLang is a high-performance serving framework for large language models and multimodal models.
ollama
Community
Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.