O Open Source Frameworks medium

SGLang

by Community

SGLang is a high-performance serving framework for large language models and multimodal models.

Visit Community View repo Submit your build →

OSS

SGLang

Added 1 June 2026

#attention #blackwell #cuda #deepseek #diffusion #glm #gpt-oss #inference

Overview

SGLang is a Python framework for serving large language models and multimodal models with optimized performance. It provides APIs and tools to deploy, batch, and run inference on LLMs efficiently at scale.

Best for

Best for
Teams building production LLM services who need performance-optimized serving infrastructure

Use cases

Deploying LLMs with low-latency inference serving
Running multimodal model inference in production
Batching and optimizing throughput for concurrent requests

Notes

28,885 stars on GitHub. Last updated 2026-06-01. Licensed Apache-2.0.

Use cases

Deploying LLMs with low-latency inference serving
Running multimodal model inference in production
Batching and optimizing throughput for concurrent requests

Pros

High-performance serving optimized for LLM inference
Supports both language and multimodal models
Active community project with substantial adoption (28k+ stars)

Cons

Python-only, limiting integration in non-Python stacks
Requires operational expertise to deploy and tune effectively
Community-maintained, not backed by a commercial vendor

Indexed from awesome-llm and enriched against its public facts.

Pros

High-performance serving optimized for LLM inference
Supports both language and multimodal models
Active community project with substantial adoption (28k+ stars)

Cons

Python-only, limiting integration in non-Python stacks
Requires operational expertise to deploy and tune effectively
Community-maintained, not backed by a commercial vendor

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Uses1entry

O OSS Obs medium

PyTorch

Community

Tensors and Dynamic neural networks in Python with strong GPU acceleration

★ 100,318 updated 1mo ago

Built with1entry

O OSS Obs medium

PyTorch

Community

Tensors and Dynamic neural networks in Python with strong GPU acceleration

★ 100,318 updated 1mo ago

Pairs with1entry

O OSS Framework medium

LangChain

Community

The agent engineering platform.

★ 138,234 updated 1mo ago

Alternative to3entries

O OSS Framework medium

vLLM

Community

A high-throughput and memory-efficient inference and serving engine for LLMs

★ 81,619 updated 1mo ago

O OSS Framework medium

TensorRT-LLM

Community

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NV

★ 13,781 updated 1mo ago

O OSS Framework medium

LMDeploy

Community

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

★ 7,876 updated 1mo ago

Used by3entries

O OSS Framework medium

DeepSeek-R1

Community

First-generation reasoning models from DeepSeek.

★ 92,010 updated 1y ago

O OSS Framework medium

GPUStack

Community

A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.

★ 5,082 updated 1mo ago

O OSS Obs medium

OpenModelZ

Community

Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)

★ 283 updated 2y ago

Pairs with8entries

O OSS Framework medium

DeepSeek-Math-7B

Community

DeepSeek Math series

O OSS Framework medium

GPUStack

Community

A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.

★ 5,082 updated 1mo ago

O OSS Framework medium

MInference

Community

[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to

★ 1,217 updated 3mo ago

O OSS Framework medium

Moonlight-A3B

Community

Moonshot's Compute-efficient MoE LLM, first Scaling Up of Muon Optimizer

O OSS Framework medium

Nemotron-4-340B

Community

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

O OSS Framework medium

Outlines

Community

Structured Outputs

★ 13,914 updated 1mo ago

O OSS Framework medium

SkyPilot

Community

Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clouds, on-prem).

★ 10,051 updated 1mo ago

O OSS Framework medium

Transformer Engine

Community

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide b

★ 3,374 updated 1mo ago

Alternatives3entries

O OSS Framework medium

mistral.rs

Community

Fast, flexible LLM inference

★ 7,205 updated 1mo ago

O OSS Framework medium

TGI

Community

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

O OSS Framework medium

vLLM

Community

A high-throughput and memory-efficient inference and serving engine for LLMs

★ 81,619 updated 1mo ago

Free 27-page guide

Get the free Developer’s Field Guide

A 27-page field guide to the AI coding workflow with Claude. Claude Code, MCP servers, the prompt patterns that work, and what to delegate. Free.

Enter your work email. We send it straight over, plus a few short notes worth knowing. Unsubscribe any time.

← Back to Open Source Submit your own entry →