O Open Source Observability medium

tokenizers

by Community

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Visit Community View repo Submit your build →

OSS

tokenizers

Added 1 June 2026

#bert #gpt #language-model #natural-language-processing #natural-language-understanding #nlp #transformers

Overview

A Rust implementation of fast tokenizers, optimized for both research and production NLP pipelines. It provides subword tokenization algorithms such as BPE, WordPiece, and Unigram with full alignment tracking. The library is framework-agnostic and includes Python bindings for easy integration.

Best for

Best for
Developers needing high-throughput tokenization for NLP model training or serving

Use cases

Tokenizing large text corpora for model training
Integrating tokenization into production inference systems
Building custom tokenizers for specialized vocabularies

Notes

10,782 stars on GitHub. Last updated 2026-05-26. Licensed Apache-2.0.

Use cases

Tokenizing large text corpora for model training
Integrating tokenization into production inference systems
Building custom tokenizers for specialized vocabularies

Pros

Blazingly fast performance due to Rust implementation
Supports multiple tokenization algorithms with consistent API
Seamless Python bindings for integration with ML workflows

Cons

Limited to tokenization tasks without broader NLP utilities
Requires compilation for Rust or using pre-built wheels
Smaller community compared to Python-native tokenizers

Indexed from awesome-llmops and enriched against its public facts.

Pros

Blazingly fast performance due to Rust implementation
Supports multiple tokenization algorithms with consistent API
Seamless Python bindings for integration with ML workflows

Cons

Limited to tokenization tasks without broader NLP utilities
Requires compilation for Rust or using pre-built wheels
Smaller community compared to Python-native tokenizers

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Built with1entry

O OSS Obs medium

PyTorch

Community

Tensors and Dynamic neural networks in Python with strong GPU acceleration

★ 100,318 updated 1mo ago

Free 27-page guide

Get the free Developer’s Field Guide

A 27-page field guide to the AI coding workflow with Claude. Claude Code, MCP servers, the prompt patterns that work, and what to delegate. Free.

Enter your work email. We send it straight over, plus a few short notes worth knowing. Unsubscribe any time.

← Back to Open Source Submit your own entry →