O Open Source Observability medium

Pachyderm

by Community

Data-Centric Pipelines and Data Versioning

Visit Community View repo Submit your build →

OSS

Pachyderm

Added 1 June 2026

#analytics #big-data #containers #data-analysis #data-science #distributed-systems #docker #go

Overview

Pachyderm is an open-source platform for data-centric pipelines and data versioning. It provides version control for datasets and enables reproducible data processing workflows. Written in Go, it treats data as a first-class citizen in the pipeline lifecycle.

Best for

Best for
Data engineers and ML teams needing reproducible data pipelines

Use cases

Versioning datasets for machine learning experiments
Building reproducible data pipelines
Tracking data lineage and provenance

Notes

6,295 stars on GitHub. Last updated 2025-02-03. Licensed Apache-2.0.

Use cases

Versioning datasets for machine learning experiments
Building reproducible data pipelines
Tracking data lineage and provenance

Pros

Open source with a strong community (over 6,000 stars)
Data versioning similar to Git for code
Scalable pipeline execution with parallel processing

Cons

Steep learning curve for data versioning concepts
Requires significant infrastructure setup (e.g., Kubernetes)
Limited to data-centric workflows, not general observability

Indexed from awesome-llmops and enriched against its public facts.

Pros

Open source with a strong community (over 6,000 stars)
Data versioning similar to Git for code
Scalable pipeline execution with parallel processing

Cons

Steep learning curve for data versioning concepts
Requires significant infrastructure setup (e.g., Kubernetes)
Limited to data-centric workflows, not general observability

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Alternative to2entries

O OSS Obs medium

DVC

Community

🦉 Data Versioning and ML Experiments

★ 15,643 updated 1mo ago

O OSS Obs medium

Prefect

Community

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

★ 22,518 updated 1mo ago

Free 27-page guide

Get the free Developer’s Field Guide

A 27-page field guide to the AI coding workflow with Claude. Claude Code, MCP servers, the prompt patterns that work, and what to delegate. Free.

Enter your work email. We send it straight over, plus a few short notes worth knowing. Unsubscribe any time.

← Back to Open Source Submit your own entry →