Enterprise DNA
O Open Source Observability medium

Delta-Lake

by Community

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

D

OSS

Delta-Lake

Added 1 June 2026

#acid #analytics #big-data #delta-lake #spark

Overview

An open-source storage framework that provides ACID transactions and schema enforcement on data lakes. It supports compute engines such as Spark, PrestoDB, Flink, Trino, and Hive, enabling a Lakehouse architecture.

Best for

Best for
Data engineers building scalable, reliable Lakehouse architectures on existing data lakes

Use cases

  • Building a reliable Lakehouse with ACID transactions on data lakes
  • Running batch and streaming pipelines with unified metadata management
  • Enforcing schema evolution and data quality constraints across multiple engines

Notes

An open-source storage framework that provides ACID transactions and schema enforcement on data lakes. It supports compute engines such as Spark, PrestoDB, Flink, Trino, and Hive, enabling a Lakehouse architecture.

8,829 stars on GitHub. Last updated 2026-06-01. Licensed Apache-2.0.

Use cases

  • Building a reliable Lakehouse with ACID transactions on data lakes
  • Running batch and streaming pipelines with unified metadata management
  • Enforcing schema evolution and data quality constraints across multiple engines

Pros

  • Open-source with strong community backing and 8,829 GitHub stars
  • Integrates with a wide range of compute engines and APIs
  • Provides time travel and versioning for data recovery and auditing

Cons

  • Originally designed for Spark, tight integration with other engines can require extra configuration
  • Scala codebase may be less accessible to teams primarily using Python or SQL
  • Setup and tuning in non-Spark environments can add operational complexity

Indexed from awesome-llmops and enriched against its public facts.

Pros

  • Open-source with strong community backing and 8,829 GitHub stars
  • Integrates with a wide range of compute engines and APIs
  • Provides time travel and versioning for data recovery and auditing

Cons

  • Originally designed for Spark, tight integration with other engines can require extra configuration
  • Scala codebase may be less accessible to teams primarily using Python or SQL
  • Setup and tuning in non-Spark environments can add operational complexity