Enterprise DNA
O Open Source Observability medium

scikit-learn

by Community

scikit-learn: machine learning in Python

S

OSS

scikit-learn

Added 1 June 2026

#data-analysis #data-science #machine-learning #python #statistics

Overview

scikit-learn is a Python library providing supervised and unsupervised machine learning algorithms with a consistent API. It includes classification, regression, clustering, dimensionality reduction, and model evaluation tools built on NumPy, SciPy, and Matplotlib.

Best for

Best for
Python developers building traditional machine learning pipelines and prototyping models quickly.

Use cases

  • Training and evaluating classification or regression models
  • Clustering data and reducing feature dimensionality
  • Comparing multiple algorithms with cross-validation and metrics

Notes

scikit-learn is a Python library providing supervised and unsupervised machine learning algorithms with a consistent API. It includes classification, regression, clustering, dimensionality reduction, and model evaluation tools built on NumPy, SciPy, and Matplotlib.

66,218 stars on GitHub. Last updated 2026-06-01. Licensed BSD-3-Clause.

Use cases

  • Training and evaluating classification or regression models
  • Clustering data and reducing feature dimensionality
  • Comparing multiple algorithms with cross-validation and metrics

Pros

  • Mature, well-documented library with extensive community support
  • Unified API across diverse algorithms reduces learning curve
  • Strong built-in tools for model selection, validation, and preprocessing

Cons

  • Not optimized for deep learning or neural networks
  • Performance lags behind specialized libraries for very large datasets
  • Limited GPU acceleration support

Indexed from awesome-llmops and enriched against its public facts.

Pros

  • Mature, well-documented library with extensive community support
  • Unified API across diverse algorithms reduces learning curve
  • Strong built-in tools for model selection, validation, and preprocessing

Cons

  • Not optimized for deep learning or neural networks
  • Performance lags behind specialized libraries for very large datasets
  • Limited GPU acceleration support

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Used by22entries
M MCP Dev low

leap-laboratories/discovery-engine

Various

Discovery Engine — find novel, statistically validated patterns in tabular data

★ 6 updated 5d ago
M MCP Dev low

mindsdb/mindsdb

Various

Platform dedicated to building an open foundation for applied Artificial Intelligence, designed for people seeking production-ready AI systems they can truly control, extend and de

★ 39,231 updated 6d ago
O OSS Obs medium

auto-sklearn

Community

Automated Machine Learning with scikit-learn

★ 8,102 updated 1mo ago
O OSS Obs medium

automl-gs

Community

Provide an input CSV and a target field to predict, generate a model + code to run it.

★ 1,866 updated 6y ago
O OSS Obs medium

BentoML

Community

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

★ 8,663 updated 2d ago
O OSS Obs medium

Comet

Community

Examples of Machine Learning code using Comet.ml

★ 173 updated 14d ago
O OSS Obs medium

Deepchecks

Community

Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling to thoroughly test

★ 4,017 updated 5mo ago
O OSS Obs medium

dtreeviz

Community

A python library for decision tree visualization and model interpretation.

★ 3,148 updated 5mo ago
O OSS Obs medium

EvalML

Community

EvalML is an AutoML library written in python.

★ 849 updated 4mo ago
O OSS Obs medium

FEDOT

Community

Automated modeling and machine learning framework FEDOT

★ 704 updated 2d ago
O OSS Obs medium

Fiddler AI

Community

Fiddler Auditor is a tool to evaluate language models.

★ 191 updated 2y ago
O OSS Obs medium

FLAML

Community

A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.

★ 4,360 updated 2d ago
O OSS Obs medium

Hypernets

Community

A General Automated Machine Learning framework to simplify the development of End-to-end AutoML toolkits in specific domains.

★ 263 updated 1mo ago
O OSS Obs medium

HPOlib2

Community

Collection of hyperparameter optimization benchmark problems

★ 168 updated 1y ago
O OSS Orchestration medium

LangFair

Community

LangFair is a Python library for conducting use-case level LLM bias and fairness assessments

★ 258 updated 4mo ago
O OSS Obs medium

metric-learn

Community

Metric learning algorithms in Python

★ 1,433 updated 2mo ago
O OSS Framework medium

MLflow

Community

MLflow - Open Source AI Platform for Agents, LLMs & Models

O OSS Obs medium

NNI

Community

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

★ 14,352 updated 1y ago
O OSS Obs medium

Pycaret

Community

Open-source, low-code AutoML platform for Python. PyCaret 4.0: sklearn-native engine + React control plane.

★ 9,802 updated 2d ago
O OSS Obs medium

TPOT

Community

The Tree-Based Pipeline Optimization Tool (TPOT) was one of the very first AutoML methods and open-source software packages developed for the data science community. TPOT was dev

O OSS Obs medium

Vegas

Community

AutoML tools chain

★ 848 updated 3y ago
O OSS Obs medium

ZenML

Community

ZenML 🙏: One AI Platform from Pipelines to Agents. https://zenml.io.

★ 5,429 updated 2d ago
Pairs with34entries
M MCP Dev low

optuna/optuna-mcp

Various

The Optuna MCP Server is a Model Context Protocol (MCP) server to interact with Optuna APIs.

★ 76 updated 6d ago
O OSS Obs medium

Aim

Community

Aim 💫 — An easy-to-use & supercharged open-source experiment tracker.

★ 6,138 updated 2d ago
O OSS Obs medium

Airflow

Community

Platform created by the community to programmatically author, schedule and monitor workflows.

O OSS Obs medium

BudgetML

Community

Deploy a ML inference service on a budget in less than 10 lines of code.

★ 1,345 updated 2y ago
O OSS Obs medium

conda

Community

A system-level, binary package and environment manager running on all major operating systems and platforms.

★ 7,418 updated 2d ago
O OSS Obs medium

Dragonfly

Community

An open source python library for scalable Bayesian optimisation.

★ 893 updated 2y ago
O OSS Obs medium

Feast

Community

The Open Source Feature Store for AI/ML

★ 7,063 updated 2d ago
O OSS Obs medium

Featureform

Community

The Virtual Feature Store. Turn your existing data infrastructure into a feature store.

★ 1,981 updated 11mo ago
O OSS Obs medium

FeatureTools

Community

An open source python library for automated feature engineering

★ 7,655 updated 4mo ago
O OSS Obs medium

Goptuna

Community

A hyperparameter optimization framework, inspired by Optuna.

★ 277 updated 9mo ago
O OSS Obs medium

Guild AI

Community

Experiment tracking, ML developer tools

★ 899 updated 1y ago
O OSS Obs medium

Hopsworks

Community

Hopsworks - Data-Intensive AI platform with a Feature Store

★ 1,299 updated 1y ago
O OSS Obs medium

HpBandSter

Community

a distributed Hyperband implementation on Steroids

★ 630 updated 3y ago
O OSS Obs medium

Hyperband

Community

Tuning hyperparams fast with Hyperband

★ 598 updated 7y ago
O OSS Obs medium

Hyperopt

Community

Distributed Asynchronous Hyperparameter Optimization in Python

★ 7,576 updated 9d ago
O OSS Obs medium

hyperunity

Community

A toolset for black-box hyperparameter optimisation.

★ 136 updated 6y ago
O OSS Obs medium

Jupyter Notebooks

Community

Jupyter Interactive Notebook

★ 13,173 updated 5d ago
O OSS Obs medium

Kubeflow Pipelines

Community

Machine Learning Pipelines for Kubeflow

★ 4,151 updated 2d ago
O OSS Obs medium

LabNotebook

Community

LabNotebook is a tool that allows you to flexibly monitor, record, save, and query all your machine learning experiments.

★ 528 updated 8y ago
O OSS Obs medium

LightGBM

Community

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other

★ 18,416 updated 2d ago
O OSS Obs medium

LUX

Community

Automatically visualize your pandas dataframe via a single print! 📊 💡

★ 5,382 updated 2y ago
O OSS Obs medium

Maniford

Community

A model-agnostic visual debugging tool for machine learning

★ 1,671 updated 1y ago
O OSS Obs medium

MegEngine

Community

MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架

★ 4,808 updated 1y ago
O OSS Obs medium

Model Search

Community

![GitHub Badge](https://img.shields.io/github/stars/google/model_search.svg?style=flat-square)

★ 3,245 updated 1y ago
O OSS Obs medium

ModelDB

Community

Open Source ML Model Versioning, Metadata, and Experiment Management

★ 1,747 updated 1y ago
O OSS Obs medium

MOE

Community

A global, black box optimization engine for real world metric optimization.

★ 1,320 updated 3y ago
O OSS Obs medium

ormb

Community

Docker for Your ML/DL Models Based on OCI Artifacts

★ 473 updated 2y ago
O OSS Obs medium

RoBO

Community

RoBO: a Robust Bayesian Optimization framework

★ 490 updated 7y ago
O OSS Obs medium

Sacred

Community

Sacred is a tool to help you configure, organize, log and reproduce experiments developed at IDSIA.

★ 4,365 updated 7mo ago
O OSS Obs medium

scalene

Community

Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals

★ 13,436 updated 3d ago
O OSS Obs medium

scikit-optimize(skopt)

Community

Sequential model-based optimization with a scipy.optimize interface

★ 2,826 updated 2y ago
O OSS Obs medium

Spearmint

Community

Spearmint Bayesian optimization codebase

★ 1,568 updated 6y ago
O OSS Obs medium

Upgini

Community

Data search & enrichment library for Machine Learning → Easily find and add relevant features to your ML & AI pipeline from hundreds of public and premium external data sources, in

★ 350 updated 2mo ago
O OSS Obs medium

whylogs

Community

An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-pre

★ 2,819 updated 1y ago