O Open Source Observability medium

Upgini

by Community

Data search & enrichment library for Machine Learning → Easily find and add relevant features to your ML & AI pipeline from hundreds of public and premium external data sources, in

Visit Community View repo Submit your build →

OSS

Upgini

Added 1 June 2026

#automated-feature-engineering #automl #automl-pipeline #chatgpt #data-enrichment #data-science #feature-engineering #feature-extraction

Overview

Upgini is a Python library that searches and enriches machine learning datasets with relevant features from hundreds of public and premium external data sources, including open and commercial LLMs. It integrates into ML pipelines to automatically find and add features that improve model performance.

Best for

Best for
Data scientists and ML engineers who need to quickly augment datasets with external features to improve model performance

Use cases

Augmenting training datasets with external features for better model accuracy
Automating feature discovery from public and premium data sources
Enriching ML pipelines with real-time external data without manual ETL

Notes

350 stars on GitHub. Last updated 2026-03-28. Licensed BSD-3-Clause.

Use cases

Augmenting training datasets with external features for better model accuracy
Automating feature discovery from public and premium data sources
Enriching ML pipelines with real-time external data without manual ETL

Pros

Access to a wide range of external data sources, including LLMs
Automates feature search and enrichment, saving manual effort
Open source with a community-driven development model

Cons

Modest community size (350 stars) may limit support and contributions
Reliance on external data sources can introduce latency or cost
Requires careful evaluation of data quality and relevance for each use case

Indexed from awesome-llmops and enriched against its public facts.

Pros

Access to a wide range of external data sources, including LLMs
Automates feature search and enrichment, saving manual effort
Open source with a community-driven development model

Cons

Modest community size (350 stars) may limit support and contributions
Reliance on external data sources can introduce latency or cost
Requires careful evaluation of data quality and relevance for each use case

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Pairs with2entries

O OSS Obs medium

scikit-learn

Community

scikit-learn: machine learning in Python

★ 66,218 updated 1mo ago

O OSS Obs medium

XGBoost

Community

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and D

★ 28,431 updated 1mo ago

Free 27-page guide

Get the free Developer’s Field Guide

A 27-page field guide to the AI coding workflow with Claude. Claude Code, MCP servers, the prompt patterns that work, and what to delegate. Free.

Enter your work email. We send it straight over, plus a few short notes worth knowing. Unsubscribe any time.

← Back to Open Source Submit your own entry →