scikit-learn
by Community
scikit-learn: machine learning in Python
OSS
scikit-learn
Added 1 June 2026
Overview
scikit-learn is a Python library providing supervised and unsupervised machine learning algorithms with a consistent API. It includes classification, regression, clustering, dimensionality reduction, and model evaluation tools built on NumPy, SciPy, and Matplotlib.
Best for
Best for
Python developers building traditional machine learning pipelines and prototyping models quickly.
Use cases
- Training and evaluating classification or regression models
- Clustering data and reducing feature dimensionality
- Comparing multiple algorithms with cross-validation and metrics
Notes
scikit-learn is a Python library providing supervised and unsupervised machine learning algorithms with a consistent API. It includes classification, regression, clustering, dimensionality reduction, and model evaluation tools built on NumPy, SciPy, and Matplotlib.
66,218 stars on GitHub. Last updated 2026-06-01. Licensed BSD-3-Clause.
Use cases
- Training and evaluating classification or regression models
- Clustering data and reducing feature dimensionality
- Comparing multiple algorithms with cross-validation and metrics
Pros
- Mature, well-documented library with extensive community support
- Unified API across diverse algorithms reduces learning curve
- Strong built-in tools for model selection, validation, and preprocessing
Cons
- Not optimized for deep learning or neural networks
- Performance lags behind specialized libraries for very large datasets
- Limited GPU acceleration support
Indexed from awesome-llmops and enriched against its public facts.
Pros
- Mature, well-documented library with extensive community support
- Unified API across diverse algorithms reduces learning curve
- Strong built-in tools for model selection, validation, and preprocessing
Cons
- Not optimized for deep learning or neural networks
- Performance lags behind specialized libraries for very large datasets
- Limited GPU acceleration support
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.
TensorFlow
Community
An Open Source Machine Learning Framework for Everyone
PyTorch
Community
Tensors and Dynamic neural networks in Python with strong GPU acceleration
XGBoost
Community
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and D
leap-laboratories/discovery-engine
Various
Discovery Engine — find novel, statistically validated patterns in tabular data
mindsdb/mindsdb
Various
Platform dedicated to building an open foundation for applied Artificial Intelligence, designed for people seeking production-ready AI systems they can truly control, extend and de
auto-sklearn
Community
Automated Machine Learning with scikit-learn
automl-gs
Community
Provide an input CSV and a target field to predict, generate a model + code to run it.
BentoML
Community
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
Comet
Community
Examples of Machine Learning code using Comet.ml
Deepchecks
Community
Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling to thoroughly test
dtreeviz
Community
A python library for decision tree visualization and model interpretation.
EvalML
Community
EvalML is an AutoML library written in python.
FEDOT
Community
Automated modeling and machine learning framework FEDOT
Fiddler AI
Community
Fiddler Auditor is a tool to evaluate language models.
FLAML
Community
A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.
Hypernets
Community
A General Automated Machine Learning framework to simplify the development of End-to-end AutoML toolkits in specific domains.
HPOlib2
Community
Collection of hyperparameter optimization benchmark problems
LangFair
Community
LangFair is a Python library for conducting use-case level LLM bias and fairness assessments
metric-learn
Community
Metric learning algorithms in Python
MLflow
Community
MLflow - Open Source AI Platform for Agents, LLMs & Models
NNI
Community
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
Pycaret
Community
Open-source, low-code AutoML platform for Python. PyCaret 4.0: sklearn-native engine + React control plane.
TPOT
Community
The Tree-Based Pipeline Optimization Tool (TPOT) was one of the very first AutoML methods and open-source software packages developed for the data science community. TPOT was dev
Vegas
Community
AutoML tools chain
ZenML
Community
ZenML 🙏: One AI Platform from Pipelines to Agents. https://zenml.io.
auto-sklearn
Community
Automated Machine Learning with scikit-learn
automl-gs
Community
Provide an input CSV and a target field to predict, generate a model + code to run it.
REMBO
Community
Bayesian optimization in high-dimensions via random embedding.
UQLM
Community
UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection
ASReview
Various
Our software ASReview is designed to accelerate the step of screening abstracts and titles with a minimum of papers to be read.
optuna/optuna-mcp
Various
The Optuna MCP Server is a Model Context Protocol (MCP) server to interact with Optuna APIs.
Aim
Community
Aim 💫 — An easy-to-use & supercharged open-source experiment tracker.
Airflow
Community
Platform created by the community to programmatically author, schedule and monitor workflows.
BudgetML
Community
Deploy a ML inference service on a budget in less than 10 lines of code.
conda
Community
A system-level, binary package and environment manager running on all major operating systems and platforms.
Dragonfly
Community
An open source python library for scalable Bayesian optimisation.
Feast
Community
The Open Source Feature Store for AI/ML
Featureform
Community
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
FeatureTools
Community
An open source python library for automated feature engineering
Goptuna
Community
A hyperparameter optimization framework, inspired by Optuna.
Guild AI
Community
Experiment tracking, ML developer tools
Hopsworks
Community
Hopsworks - Data-Intensive AI platform with a Feature Store
HpBandSter
Community
a distributed Hyperband implementation on Steroids
Hyperband
Community
Tuning hyperparams fast with Hyperband
Hyperopt
Community
Distributed Asynchronous Hyperparameter Optimization in Python
hyperunity
Community
A toolset for black-box hyperparameter optimisation.
Jupyter Notebooks
Community
Jupyter Interactive Notebook
Kubeflow Pipelines
Community
Machine Learning Pipelines for Kubeflow
LabNotebook
Community
LabNotebook is a tool that allows you to flexibly monitor, record, save, and query all your machine learning experiments.
LightGBM
Community
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other
LUX
Community
Automatically visualize your pandas dataframe via a single print! 📊 💡
Maniford
Community
A model-agnostic visual debugging tool for machine learning
MegEngine
Community
MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架
Model Search
Community

ModelDB
Community
Open Source ML Model Versioning, Metadata, and Experiment Management
MOE
Community
A global, black box optimization engine for real world metric optimization.
ormb
Community
Docker for Your ML/DL Models Based on OCI Artifacts
RoBO
Community
RoBO: a Robust Bayesian Optimization framework
Sacred
Community
Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.
scalene
Community
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
scikit-optimize(skopt)
Community
Sequential model-based optimization with a scipy.optimize interface
Spearmint
Community
Spearmint Bayesian optimization codebase
Upgini
Community
Data search & enrichment library for Machine Learning → Easily find and add relevant features to your ML & AI pipeline from hundreds of public and premium external data sources, in
whylogs
Community
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-pre