DVC
by Community
๐ฆ Data Versioning and ML Experiments
OSS
DVC
Added 1 June 2026
Overview
DVC (Data Version Control) is a version control system for machine learning projects that tracks data, models, and experiment metadata alongside code. It integrates with Git to manage large files and pipelines, enabling reproducible ML workflows without storing binaries in repositories.
Best for
Best for
ML teams building reproducible pipelines who need Git-like versioning for data and models
Use cases
- Track dataset versions and model artifacts across experiment iterations
- Reproduce ML pipelines and results from previous runs
- Collaborate on ML projects with versioned data and experiment history
Notes
DVC (Data Version Control) is a version control system for machine learning projects that tracks data, models, and experiment metadata alongside code. It integrates with Git to manage large files and pipelines, enabling reproducible ML workflows without storing binaries in repositories.
15,643 stars on GitHub. Last updated 2026-06-01. Licensed Apache-2.0.
Use cases
- Track dataset versions and model artifacts across experiment iterations
- Reproduce ML pipelines and results from previous runs
- Collaborate on ML projects with versioned data and experiment history
Pros
- Integrates seamlessly with Git for unified project versioning
- Handles large files and remote storage without bloating repositories
- Tracks full experiment lineage including parameters, metrics, and outputs
Cons
- Requires Python and command-line familiarity for typical workflows
- Learning curve for teams unfamiliar with version control concepts
- Remote storage setup and configuration adds operational overhead
Indexed from awesome-llmops and enriched against its public facts.
Pros
- Integrates seamlessly with Git for unified project versioning
- Handles large files and remote storage without bloating repositories
- Tracks full experiment lineage including parameters, metrics, and outputs
Cons
- Requires Python and command-line familiarity for typical workflows
- Learning curve for teams unfamiliar with version control concepts
- Remote storage setup and configuration adds operational overhead
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.
Awesome Production Machine Learning
Community
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
Featureform
Community
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
Great Expectations
Community
Always know what to expect from your data.
Guild AI
Community
Experiment tracking, ML developer tools
JuiceFS
Community
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
Kedro-Viz
Community
Visualise your Kedro data and machine-learning pipelines and track your experiments.
Kedro
Community
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducib
MLEM
Community
๐ถ A tool to package, serve, and deploy any ML model on any platform. Archived to be resurrected one day๐ค
ModelFox
Community
ModelFox makes it easy to train, deploy, and monitor machine learning models.
NNI
Community
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
Piperider
Community
Code review for data in dbt
Prefect
Community
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
Puzzlet AI
Community
Redirecting...
Quilt
Community
Quilt is a Scientific Data Management Platform on AWS that helps teams and AI find, trust, and reuse data through deeply versioned, context-rich data packages.
Seldon-core
Community
An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
visenger/awesome-mlops
Community
A curated list of references for MLOps
whylogs
Community
An open-source data logging library for machine learning models and data pipelines. ๐ Provides visibility into data quality & model performance over time. ๐ก๏ธ Supports privacy-pre
ZenML
Community
ZenML ๐: One AI Platform from Pipelines to Agents. https://zenml.io.
Zeno
Community
AI Data Management & Evaluation Platform
ArtiVC
Community
A version control system to manage large files.
LakeFS
Community
lakeFS - Data version control for your data lake | Git for data
ModelDB
Community
Open Source ML Model Versioning, Metadata, and Experiment Management
Pachyderm
Community
Data-Centric Pipelines and Data Versioning
Quilt
Community
Quilt is a Scientific Data Management Platform on AWS that helps teams and AI find, trust, and reuse data through deeply versioned, context-rich data packages.