Enterprise DNA
O Open Source Observability medium

LakeFS

by Community

lakeFS - Data version control for your data lake | Git for data

L

OSS

LakeFS

Added 1 June 2026

#apache-spark #apache-sparksql #aws-s3 #azure-blob-storage #azure-storage #data-engineering #data-lake #data-quality

Overview

LakeFS is an open-source tool that provides Git-like version control for data lakes. It enables branching, committing, merging, and reverting changes to data, similar to source code management. Written in Go, it has garnered over 5,000 GitHub stars as a community-driven project.

Best for

Best for
Data engineers and teams managing data lakes that need version control for data pipelines and experiments

Use cases

  • Versioning data lake tables for reproducibility
  • Enabling data experimentation with isolated branches
  • Rolling back data changes to a previous state

Notes

LakeFS is an open-source tool that provides Git-like version control for data lakes. It enables branching, committing, merging, and reverting changes to data, similar to source code management. Written in Go, it has garnered over 5,000 GitHub stars as a community-driven project.

5,388 stars on GitHub. Last updated 2026-05-30. Licensed Apache-2.0.

Use cases

  • Versioning data lake tables for reproducibility
  • Enabling data experimentation with isolated branches
  • Rolling back data changes to a previous state

Pros

  • Open-source and free to use with no vendor lock-in
  • Large community adoption evidenced by 5,388 stars
  • Integrates with existing data lake storage formats

Cons

  • Lacks commercial support guarantees as a community project
  • Learning curve for users unfamiliar with Git-like workflows
  • May introduce operational overhead for very large datasets

Indexed from awesome-llmops and enriched against its public facts.

Pros

  • Open-source and free to use with no vendor lock-in
  • Large community adoption evidenced by 5,388 stars
  • Integrates with existing data lake storage formats

Cons

  • Lacks commercial support guarantees as a community project
  • Learning curve for users unfamiliar with Git-like workflows
  • May introduce operational overhead for very large datasets