O Open Source Observability medium

CTranslate2

by Community

Fast inference engine for Transformer models

Visit Community View repo Submit your build →

OSS

CTranslate2

Added 1 June 2026

#avx #avx2 #cpp #cuda #deep-learning #deep-neural-networks #gemm #inference

Overview

CTranslate2 is a fast inference engine for Transformer models written in C++. It optimizes model execution through techniques like weight quantization and pruning, enabling lower latency and reduced memory usage compared to standard frameworks.

Best for

Best for
Developers deploying Transformer models in production who need maximum inference speed on CPU or limited hardware

Use cases

Deploy Transformer models for real-time translation or text generation
Optimize CPU-based inference for production serving
Integrate into pipelines requiring high-throughput model execution

Notes

4,507 stars on GitHub. Last updated 2026-05-29. Licensed MIT.

Use cases

Deploy Transformer models for real-time translation or text generation
Optimize CPU-based inference for production serving
Integrate into pipelines requiring high-throughput model execution

Pros

Written in C++ for high performance and low overhead
Supports weight quantization and pruning for efficient inference
Compatible with models from OpenNMT-py and Hugging Face Transformers

Cons

Limited to specific Transformer architectures and model formats
Requires a conversion step to load models from common frameworks
Less flexible for custom layers or training tasks

Indexed from awesome-llmops and enriched against its public facts.

Pros

Written in C++ for high performance and low overhead
Supports weight quantization and pruning for efficient inference
Compatible with models from OpenNMT-py and Hugging Face Transformers

Cons

Limited to specific Transformer architectures and model formats
Requires a conversion step to load models from common frameworks
Less flexible for custom layers or training tasks

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Pairs with2entries

O OSS Obs medium

Faster Whisper

Community

Faster Whisper transcription with CTranslate2

★ 23,312 updated 7mo ago

O OSS Obs medium

whisper

Community

Robust Speech Recognition via Large-Scale Weak Supervision

★ 101,156 updated 3mo ago

Alternative to1entry

O OSS Framework medium

llama.cpp

Community

LLM inference in C/C++

★ 114,160 updated 1mo ago

Free 27-page guide

Get the free Developer’s Field Guide

A 27-page field guide to the AI coding workflow with Claude. Claude Code, MCP servers, the prompt patterns that work, and what to delegate. Free.

Enter your work email. We send it straight over, plus a few short notes worth knowing. Unsubscribe any time.

← Back to Open Source Submit your own entry →