Enterprise DNA
O Open Source Frameworks medium

Chinese Large Model Leaderboard

by Community

非线智能 NoneLinear - ReLE评测:中文AI大模型能力评测(持续更新):目前已囊括374个大模型,覆盖chatgpt、gpt-5.4、谷歌gemini-3.1-pro、Claude-4.6、文心ERNIE-X1.1、ERNIE-5.0、qwen3.6-max、qwen3.6-plus、百川、讯飞星火、商汤senseChat等商用模型, 以及st

CL

OSS

Chinese Large Model Leaderboard

Added 1 June 2026

#agentic-ai #artificial-intelligence #llm-agent #llm-evaluation

Overview

A community-maintained benchmark for Chinese large language models, covering 374 commercial and open-source models including GPT, Gemini, Claude, ERNIE, Qwen, and others. It provides a continuously updated leaderboard and a defect library with over 2 million entries for analysis and improvement.

Best for

Best for
Developers and researchers evaluating Chinese large language models.

Use cases

  • Compare performance of Chinese LLMs across multiple models
  • Identify common defects and weaknesses in large language models
  • Track benchmark trends and model improvements over time

Notes

A community-maintained benchmark for Chinese large language models, covering 374 commercial and open-source models including GPT, Gemini, Claude, ERNIE, Qwen, and others. It provides a continuously updated leaderboard and a defect library with over 2 million entries for analysis and improvement.

6,103 stars on GitHub. Last updated 2026-05-30.

Use cases

  • Compare performance of Chinese LLMs across multiple models
  • Identify common defects and weaknesses in large language models
  • Track benchmark trends and model improvements over time

Pros

  • Covers a wide range of both proprietary and open-source Chinese LLMs
  • Includes a large defect library for deeper analysis
  • Regularly updated with community contributions

Cons

  • Focused on Chinese language models, limiting global applicability
  • Evaluation methodology is community-driven, not formally peer-reviewed
  • Interface and documentation are primarily in Chinese

Indexed from awesome-llm and enriched against its public facts.

Pros

  • Covers a wide range of both proprietary and open-source Chinese LLMs
  • Includes a large defect library for deeper analysis
  • Regularly updated with community contributions

Cons

  • Focused on Chinese language models, limiting global applicability
  • Evaluation methodology is community-driven, not formally peer-reviewed
  • Interface and documentation are primarily in Chinese