Enterprise DNA
O Open Source Observability medium

Horovod

by Community

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

H

OSS

Horovod

Added 1 June 2026

#baidu #deep-learning #deeplearning #keras #machine-learning #machinelearning #mpi #mxnet

Overview

Horovod is a distributed training framework that scales deep learning across multiple GPUs and nodes for TensorFlow, Keras, PyTorch, and Apache MXNet. It abstracts communication patterns like all-reduce to simplify multi-machine training without requiring extensive code rewrites. Developers add a few lines to existing training scripts to enable distributed execution.

Best for

Best for
ML engineers training large models who need to scale across multiple GPUs or nodes without rewriting training logic

Use cases

  • Training large models across multiple GPUs or TPUs faster
  • Scaling PyTorch or TensorFlow experiments to multi-node clusters
  • Reducing training time for production ML pipelines

Notes

Horovod is a distributed training framework that scales deep learning across multiple GPUs and nodes for TensorFlow, Keras, PyTorch, and Apache MXNet. It abstracts communication patterns like all-reduce to simplify multi-machine training without requiring extensive code rewrites. Developers add a few lines to existing training scripts to enable distributed execution.

14,696 stars on GitHub. Last updated 2025-12-01.

Use cases

  • Training large models across multiple GPUs or TPUs faster
  • Scaling PyTorch or TensorFlow experiments to multi-node clusters
  • Reducing training time for production ML pipelines

Pros

  • Works with major frameworks (PyTorch, TensorFlow, Keras, MXNet) with minimal code changes
  • Handles communication optimization automatically, reducing boilerplate
  • Well-tested in production with 14k+ GitHub stars and active community

Cons

  • Requires infrastructure setup (multiple GPUs/nodes) to see benefits
  • Learning curve for distributed training concepts and debugging across machines
  • Performance gains depend on network bandwidth and cluster configuration

Indexed from awesome-llmops and enriched against its public facts.

Pros

  • Works with major frameworks (PyTorch, TensorFlow, Keras, MXNet) with minimal code changes
  • Handles communication optimization automatically, reducing boilerplate
  • Well-tested in production with 14k+ GitHub stars and active community

Cons

  • Requires infrastructure setup (multiple GPUs/nodes) to see benefits
  • Learning curve for distributed training concepts and debugging across machines
  • Performance gains depend on network bandwidth and cluster configuration