CompassRank
by Community
评测榜单旨在为大语言模型和多模态模型提供全面、客观且中立的得分与排名,同时提供多能力维度的评分参考,以便用户能够更全面地了解大模型的能力水平。
OSS
CompassRank
Added 1 June 2026
Overview
CompassRank is a community-driven framework that provides comprehensive, objective scores and rankings for large language models and multimodal models. It evaluates models across multiple capability dimensions, allowing users to understand their strengths and weaknesses. The benchmark is openly accessible and aims to offer neutral comparisons.
Best for
Best for
Developers evaluating and comparing open-source LLMs and multimodal models
Use cases
- Comparing model performance across different capability dimensions
- Selecting the best model for a specific task or application
- Tracking model improvement over iterations or versions
Notes
CompassRank is a community-driven framework that provides comprehensive, objective scores and rankings for large language models and multimodal models. It evaluates models across multiple capability dimensions, allowing users to understand their strengths and weaknesses. The benchmark is openly accessible and aims to offer neutral comparisons.
Use cases
- Comparing model performance across different capability dimensions
- Selecting the best model for a specific task or application
- Tracking model improvement over iterations or versions
Pros
- Provides multi-dimensional scoring for nuanced model comparison
- Community-driven with open methodology and transparency
- Covers both language and multimodal models in a unified platform
Cons
- Limited coverage of proprietary models without public API access
- Benchmark results may not directly translate to real-world performance
- Relies on community contributions for updates and extensions
Indexed from awesome-llm and enriched against its public facts.
Pros
- Provides multi-dimensional scoring for nuanced model comparison
- Community-driven with open methodology and transparency
- Covers both language and multimodal models in a unified platform
Cons
- Limited coverage of proprietary models without public API access
- Benchmark results may not directly translate to real-world performance
- Relies on community contributions for updates and extensions
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.
lm-evaluation-harness
Community
A framework for few-shot evaluation of language models.
OpenAI Evals
Community
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.