Enterprise DNA
O Open Source Observability medium

x-stable-diffusion

by Community

Real-time inference for Stable Diffusion - 0.88s latency. Covers AITemplate, nvFuser, TensorRT, FlashAttention. Join our Discord communty: https://discord.com/invite/TgHXuSJEk6

X

OSS

x-stable-diffusion

Added 1 June 2026

#aitemplate #automl #cuda #docker #inference #notebook #nvfuser #onnx

Overview

x-stable-diffusion provides real-time inference for Stable Diffusion with a reported latency of 0.88 seconds. It leverages optimizations including AITemplate, nvFuser, TensorRT, and FlashAttention to accelerate model execution on compatible hardware.

Best for

Best for
Developers and researchers optimizing Stable Diffusion for low-latency inference on NVIDIA hardware.

Use cases

  • Deploying Stable Diffusion for near-real-time image generation tasks
  • Benchmarking inference performance across different optimization backends
  • Experimenting with accelerated attention and template-based compilation

Notes

x-stable-diffusion provides real-time inference for Stable Diffusion with a reported latency of 0.88 seconds. It leverages optimizations including AITemplate, nvFuser, TensorRT, and FlashAttention to accelerate model execution on compatible hardware.

560 stars on GitHub. Last updated 2023-12-04. Licensed Apache-2.0.

Use cases

  • Deploying Stable Diffusion for near-real-time image generation tasks
  • Benchmarking inference performance across different optimization backends
  • Experimenting with accelerated attention and template-based compilation

Pros

  • Achieves very low inference latency (0.88s) through combined GPU optimizations
  • Integrates multiple state-of-the-art optimization techniques in one repository
  • Open source with an active Discord community for support and updates

Cons

  • Primarily targets NVIDIA GPUs due to reliance on CUDA-based libraries
  • Requires manual setup and configuration of each optimization backend
  • Limited documentation beyond the README and community Discord

Indexed from awesome-llmops and enriched against its public facts.

Pros

  • Achieves very low inference latency (0.88s) through combined GPU optimizations
  • Integrates multiple state-of-the-art optimization techniques in one repository
  • Open source with an active Discord community for support and updates

Cons

  • Primarily targets NVIDIA GPUs due to reliance on CUDA-based libraries
  • Requires manual setup and configuration of each optimization backend
  • Limited documentation beyond the README and community Discord