Enterprise DNA
O Open Source Observability medium

DVC

by Community

๐Ÿฆ‰ Data Versioning and ML Experiments

D

OSS

DVC

Added 1 June 2026

#ai #data-science #data-version-control #developer-tools #machine-learning #reproducibility #unstructured-data

Overview

DVC (Data Version Control) is a version control system for machine learning projects that tracks data, models, and experiment metadata alongside code. It integrates with Git to manage large files and pipelines, enabling reproducible ML workflows without storing binaries in repositories.

Best for

Best for
ML teams building reproducible pipelines who need Git-like versioning for data and models

Use cases

  • Track dataset versions and model artifacts across experiment iterations
  • Reproduce ML pipelines and results from previous runs
  • Collaborate on ML projects with versioned data and experiment history

Notes

DVC (Data Version Control) is a version control system for machine learning projects that tracks data, models, and experiment metadata alongside code. It integrates with Git to manage large files and pipelines, enabling reproducible ML workflows without storing binaries in repositories.

15,643 stars on GitHub. Last updated 2026-06-01. Licensed Apache-2.0.

Use cases

  • Track dataset versions and model artifacts across experiment iterations
  • Reproduce ML pipelines and results from previous runs
  • Collaborate on ML projects with versioned data and experiment history

Pros

  • Integrates seamlessly with Git for unified project versioning
  • Handles large files and remote storage without bloating repositories
  • Tracks full experiment lineage including parameters, metrics, and outputs

Cons

  • Requires Python and command-line familiarity for typical workflows
  • Learning curve for teams unfamiliar with version control concepts
  • Remote storage setup and configuration adds operational overhead

Indexed from awesome-llmops and enriched against its public facts.

Pros

  • Integrates seamlessly with Git for unified project versioning
  • Handles large files and remote storage without bloating repositories
  • Tracks full experiment lineage including parameters, metrics, and outputs

Cons

  • Requires Python and command-line familiarity for typical workflows
  • Learning curve for teams unfamiliar with version control concepts
  • Remote storage setup and configuration adds operational overhead

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Pairs with19entries
O OSS Obs medium

Awesome Production Machine Learning

Community

A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning

โ˜… 20,585 updated 2d ago
O OSS Obs medium

Featureform

Community

The Virtual Feature Store. Turn your existing data infrastructure into a feature store.

โ˜… 1,981 updated 11mo ago
O OSS Obs medium

Great Expectations

Community

Always know what to expect from your data.

โ˜… 11,532 updated 2d ago
O OSS Obs medium

Guild AI

Community

Experiment tracking, ML developer tools

โ˜… 899 updated 1y ago
O OSS Obs medium

JuiceFS

Community

JuiceFS is a distributed POSIX file system built on top of Redis and S3.

โ˜… 13,645 updated 2d ago
O OSS Obs medium

Kedro-Viz

Community

Visualise your Kedro data and machine-learning pipelines and track your experiments.

โ˜… 749 updated 5d ago
O OSS Obs medium

Kedro

Community

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducib

โ˜… 10,867 updated 2d ago
O OSS Obs medium

MLEM

Community

๐Ÿถ A tool to package, serve, and deploy any ML model on any platform. Archived to be resurrected one day๐Ÿคž

โ˜… 718 updated 2y ago
O OSS Obs medium

ModelFox

Community

ModelFox makes it easy to train, deploy, and monitor machine learning models.

โ˜… 1,467 updated 1y ago
O OSS Obs medium

NNI

Community

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

โ˜… 14,352 updated 1y ago
O OSS Obs medium

Piperider

Community

Code review for data in dbt

โ˜… 494 updated 1y ago
O OSS Obs medium

Prefect

Community

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

โ˜… 22,518 updated 2d ago
O OSS Obs medium

Puzzlet AI

Community

Redirecting...

O OSS Obs medium

Quilt

Community

Quilt is a Scientific Data Management Platform on AWS that helps teams and AI find, trust, and reuse data through deeply versioned, context-rich data packages.

โ˜… 1,364 updated 2d ago
O OSS Obs medium

Seldon-core

Community

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models

โ˜… 4,752 updated 2mo ago
O OSS Obs medium

visenger/awesome-mlops

Community

A curated list of references for MLOps

โ˜… 13,923 updated 1y ago
O OSS Obs medium

whylogs

Community

An open-source data logging library for machine learning models and data pipelines. ๐Ÿ“š Provides visibility into data quality & model performance over time. ๐Ÿ›ก๏ธ Supports privacy-pre

โ˜… 2,819 updated 1y ago
O OSS Obs medium

ZenML

Community

ZenML ๐Ÿ™: One AI Platform from Pipelines to Agents. https://zenml.io.

โ˜… 5,429 updated 2d ago
O OSS Obs medium

Zeno

Community

AI Data Management & Evaluation Platform

โ˜… 214 updated 2y ago