XGBoost
by Community
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and D
OSS
XGBoost
Added 1 June 2026
Overview
XGBoost is a gradient boosting library that trains decision tree ensembles for classification, regression, and ranking tasks. It runs on single machines or distributed systems like Spark, Hadoop, and Dask, with bindings for Python, R, Java, Scala, and C++.
Best for
Best for
Data scientists and ML engineers building production models on structured datasets.
Use cases
- Building high-accuracy predictive models for tabular data
- Training models at scale across distributed clusters
- Competing in machine learning competitions
Notes
XGBoost is a gradient boosting library that trains decision tree ensembles for classification, regression, and ranking tasks. It runs on single machines or distributed systems like Spark, Hadoop, and Dask, with bindings for Python, R, Java, Scala, and C++.
28,431 stars on GitHub. Last updated 2026-05-28. Licensed Apache-2.0.
Use cases
- Building high-accuracy predictive models for tabular data
- Training models at scale across distributed clusters
- Competing in machine learning competitions
Pros
- Consistently outperforms other gradient boosting implementations on structured data
- Handles both single-machine and distributed training without code changes
- Mature ecosystem with extensive documentation and community support
Cons
- Requires careful hyperparameter tuning to avoid overfitting
- Slower than simpler models for real-time inference on resource-constrained devices
- Works best on tabular data, not designed for images or text
Indexed from awesome-llmops and enriched against its public facts.
Pros
- Consistently outperforms other gradient boosting implementations on structured data
- Handles both single-machine and distributed training without code changes
- Mature ecosystem with extensive documentation and community support
Cons
- Requires careful hyperparameter tuning to avoid overfitting
- Slower than simpler models for real-time inference on resource-constrained devices
- Works best on tabular data, not designed for images or text
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.
Comet
Community
Examples of Machine Learning code using Comet.ml
dtreeviz
Community
A python library for decision tree visualization and model interpretation.
EvalML
Community
EvalML is an AutoML library written in python.
FLAML
Community
A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.
Hypernets
Community
A General Automated Machine Learning framework to simplify the development of End-to-end AutoML toolkits in specific domains.
TPOT
Community
The Tree-Based Pipeline Optimization Tool (TPOT) was one of the very first AutoML methods and open-source software packages developed for the data science community. TPOT was dev
automl-gs
Community
Provide an input CSV and a target field to predict, generate a model + code to run it.
Deepchecks
Community
Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling to thoroughly test
Dragonfly
Community
An open source python library for scalable Bayesian optimisation.
Feast
Community
The Open Source Feature Store for AI/ML
FeatureTools
Community
An open source python library for automated feature engineering
Goptuna
Community
A hyperparameter optimization framework, inspired by Optuna.
Hyperopt
Community
Distributed Asynchronous Hyperparameter Optimization in Python
hyperunity
Community
A toolset for black-box hyperparameter optimisation.
Jupyter Notebooks
Community
Jupyter Interactive Notebook
MOE
Community
A global, black box optimization engine for real world metric optimization.
REMBO
Community
Bayesian optimization in high-dimensions via random embedding.
RoBO
Community
RoBO: a Robust Bayesian Optimization framework
Upgini
Community
Data search & enrichment library for Machine Learning → Easily find and add relevant features to your ML & AI pipeline from hundreds of public and premium external data sources, in
whylogs
Community
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-pre