Enterprise DNA
O Open Source Observability medium

Upgini

by Community

Data search & enrichment library for Machine Learning → Easily find and add relevant features to your ML & AI pipeline from hundreds of public and premium external data sources, in

U

OSS

Upgini

Added 1 June 2026

#automated-feature-engineering #automl #automl-pipeline #chatgpt #data-enrichment #data-science #feature-engineering #feature-extraction

Overview

Upgini is a Python library that searches and enriches machine learning datasets with relevant features from hundreds of public and premium external data sources, including open and commercial LLMs. It integrates into ML pipelines to automatically find and add features that improve model performance.

Best for

Best for
Data scientists and ML engineers who need to quickly augment datasets with external features to improve model performance

Use cases

  • Augmenting training datasets with external features for better model accuracy
  • Automating feature discovery from public and premium data sources
  • Enriching ML pipelines with real-time external data without manual ETL

Notes

Upgini is a Python library that searches and enriches machine learning datasets with relevant features from hundreds of public and premium external data sources, including open and commercial LLMs. It integrates into ML pipelines to automatically find and add features that improve model performance.

350 stars on GitHub. Last updated 2026-03-28. Licensed BSD-3-Clause.

Use cases

  • Augmenting training datasets with external features for better model accuracy
  • Automating feature discovery from public and premium data sources
  • Enriching ML pipelines with real-time external data without manual ETL

Pros

  • Access to a wide range of external data sources, including LLMs
  • Automates feature search and enrichment, saving manual effort
  • Open source with a community-driven development model

Cons

  • Modest community size (350 stars) may limit support and contributions
  • Reliance on external data sources can introduce latency or cost
  • Requires careful evaluation of data quality and relevance for each use case

Indexed from awesome-llmops and enriched against its public facts.

Pros

  • Access to a wide range of external data sources, including LLMs
  • Automates feature search and enrichment, saving manual effort
  • Open source with a community-driven development model

Cons

  • Modest community size (350 stars) may limit support and contributions
  • Reliance on external data sources can introduce latency or cost
  • Requires careful evaluation of data quality and relevance for each use case