Enterprise DNA
O Open Source Observability medium

XGBoost

by Community

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and D

X

OSS

XGBoost

Added 1 June 2026

#distributed-systems #gbdt #gbm #gbrt #machine-learning #xgboost

Overview

XGBoost is a gradient boosting library that trains decision tree ensembles for classification, regression, and ranking tasks. It runs on single machines or distributed systems like Spark, Hadoop, and Dask, with bindings for Python, R, Java, Scala, and C++.

Best for

Best for
Data scientists and ML engineers building production models on structured datasets.

Use cases

  • Building high-accuracy predictive models for tabular data
  • Training models at scale across distributed clusters
  • Competing in machine learning competitions

Notes

XGBoost is a gradient boosting library that trains decision tree ensembles for classification, regression, and ranking tasks. It runs on single machines or distributed systems like Spark, Hadoop, and Dask, with bindings for Python, R, Java, Scala, and C++.

28,431 stars on GitHub. Last updated 2026-05-28. Licensed Apache-2.0.

Use cases

  • Building high-accuracy predictive models for tabular data
  • Training models at scale across distributed clusters
  • Competing in machine learning competitions

Pros

  • Consistently outperforms other gradient boosting implementations on structured data
  • Handles both single-machine and distributed training without code changes
  • Mature ecosystem with extensive documentation and community support

Cons

  • Requires careful hyperparameter tuning to avoid overfitting
  • Slower than simpler models for real-time inference on resource-constrained devices
  • Works best on tabular data, not designed for images or text

Indexed from awesome-llmops and enriched against its public facts.

Pros

  • Consistently outperforms other gradient boosting implementations on structured data
  • Handles both single-machine and distributed training without code changes
  • Mature ecosystem with extensive documentation and community support

Cons

  • Requires careful hyperparameter tuning to avoid overfitting
  • Slower than simpler models for real-time inference on resource-constrained devices
  • Works best on tabular data, not designed for images or text

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Used by6entries
Pairs with14entries
O OSS Obs medium

automl-gs

Community

Provide an input CSV and a target field to predict, generate a model + code to run it.

★ 1,866 updated 6y ago
O OSS Obs medium

Deepchecks

Community

Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling to thoroughly test

★ 4,017 updated 5mo ago
O OSS Obs medium

Dragonfly

Community

An open source python library for scalable Bayesian optimisation.

★ 893 updated 2y ago
O OSS Obs medium

Feast

Community

The Open Source Feature Store for AI/ML

★ 7,063 updated 2d ago
O OSS Obs medium

FeatureTools

Community

An open source python library for automated feature engineering

★ 7,655 updated 4mo ago
O OSS Obs medium

Goptuna

Community

A hyperparameter optimization framework, inspired by Optuna.

★ 277 updated 9mo ago
O OSS Obs medium

Hyperopt

Community

Distributed Asynchronous Hyperparameter Optimization in Python

★ 7,576 updated 9d ago
O OSS Obs medium

hyperunity

Community

A toolset for black-box hyperparameter optimisation.

★ 136 updated 6y ago
O OSS Obs medium

Jupyter Notebooks

Community

Jupyter Interactive Notebook

★ 13,173 updated 5d ago
O OSS Obs medium

MOE

Community

A global, black box optimization engine for real world metric optimization.

★ 1,320 updated 3y ago
O OSS Obs medium

REMBO

Community

Bayesian optimization in high-dimensions via random embedding.

★ 116 updated 12y ago
O OSS Obs medium

RoBO

Community

RoBO: a Robust Bayesian Optimization framework

★ 490 updated 7y ago
O OSS Obs medium

Upgini

Community

Data search & enrichment library for Machine Learning → Easily find and add relevant features to your ML & AI pipeline from hundreds of public and premium external data sources, in

★ 350 updated 2mo ago
O OSS Obs medium

whylogs

Community

An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-pre

★ 2,819 updated 1y ago