O Open Source Frameworks medium

Datatrove

by Community

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Visit Community View repo Submit your build →

OSS

Datatrove

Added 1 June 2026

Overview

Datatrove is an open-source Python framework for building platform-agnostic data processing pipelines. It provides customizable blocks that users assemble into workflows, reducing the need for custom scripting. The project is maintained by the Hugging Face community and has over 3000 stars on GitHub.

Best for

Best for
Developers who need flexible, reusable data pipelines without locking into a specific platform

Use cases

Assembling modular data preprocessing pipelines for machine learning
Creating reusable data cleaning and transformation workflows
Building scalable ETL processes without writing glue code

Notes

3,076 stars on GitHub. Last updated 2026-05-26. Licensed Apache-2.0.

Use cases

Assembling modular data preprocessing pipelines for machine learning
Creating reusable data cleaning and transformation workflows
Building scalable ETL processes without writing glue code

Pros

Modular block-based design promotes code reuse and clarity
Platform-agnostic, works across different execution environments
Backed by a large open-source community with active development

Cons

Requires learning the block abstraction paradigm
May introduce overhead for simple or one-off data tasks
Documentation and examples may lag behind rapid development

Indexed from awesome-llm and enriched against its public facts.

Pros

Modular block-based design promotes code reuse and clarity
Platform-agnostic, works across different execution environments
Backed by a large open-source community with active development

Cons

Requires learning the block abstraction paradigm
May introduce overhead for simple or one-off data tasks
Documentation and examples may lag behind rapid development

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Built with1entry

O OSS Obs medium

PyTorch

Community

Tensors and Dynamic neural networks in Python with strong GPU acceleration

★ 100,318 updated 1mo ago

Free 27-page guide

Get the free Developer’s Field Guide

A 27-page field guide to the AI coding workflow with Claude. Claude Code, MCP servers, the prompt patterns that work, and what to delegate. Free.

Enter your work email. We send it straight over, plus a few short notes worth knowing. Unsubscribe any time.

← Back to Open Source Submit your own entry →