Upgini
by Community
Data search & enrichment library for Machine Learning → Easily find and add relevant features to your ML & AI pipeline from hundreds of public and premium external data sources, in
OSS
Upgini
Added 1 June 2026
Overview
Upgini is a Python library that searches and enriches machine learning datasets with relevant features from hundreds of public and premium external data sources, including open and commercial LLMs. It integrates into ML pipelines to automatically find and add features that improve model performance.
Best for
Best for
Data scientists and ML engineers who need to quickly augment datasets with external features to improve model performance
Use cases
- Augmenting training datasets with external features for better model accuracy
- Automating feature discovery from public and premium data sources
- Enriching ML pipelines with real-time external data without manual ETL
Notes
Upgini is a Python library that searches and enriches machine learning datasets with relevant features from hundreds of public and premium external data sources, including open and commercial LLMs. It integrates into ML pipelines to automatically find and add features that improve model performance.
350 stars on GitHub. Last updated 2026-03-28. Licensed BSD-3-Clause.
Use cases
- Augmenting training datasets with external features for better model accuracy
- Automating feature discovery from public and premium data sources
- Enriching ML pipelines with real-time external data without manual ETL
Pros
- Access to a wide range of external data sources, including LLMs
- Automates feature search and enrichment, saving manual effort
- Open source with a community-driven development model
Cons
- Modest community size (350 stars) may limit support and contributions
- Reliance on external data sources can introduce latency or cost
- Requires careful evaluation of data quality and relevance for each use case
Indexed from awesome-llmops and enriched against its public facts.
Pros
- Access to a wide range of external data sources, including LLMs
- Automates feature search and enrichment, saving manual effort
- Open source with a community-driven development model
Cons
- Modest community size (350 stars) may limit support and contributions
- Reliance on external data sources can introduce latency or cost
- Requires careful evaluation of data quality and relevance for each use case
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.
scikit-learn
Community
scikit-learn: machine learning in Python
XGBoost
Community
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and D
LightGBM
Community
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other
PyTorch
Community
Tensors and Dynamic neural networks in Python with strong GPU acceleration
TensorFlow
Community
An Open Source Machine Learning Framework for Everyone