Enterprise DNA
O Open Source Observability medium

whisper

by Community

Robust Speech Recognition via Large-Scale Weak Supervision

W

OSS

whisper

Added 1 June 2026

Overview

Open-source speech-to-text model trained on 680,000 hours of multilingual audio data from the web. Whisper handles various audio conditions, accents, and technical language without requiring fine-tuning. It runs locally via Python and supports 99 languages.

Best for

Best for
Developers building privacy-first or offline-capable voice features with multilingual requirements

Use cases

  • Transcribing user audio in applications without cloud API dependency
  • Building multilingual voice interfaces and accessibility features
  • Processing noisy or accented speech in production systems

Notes

Open-source speech-to-text model trained on 680,000 hours of multilingual audio data from the web. Whisper handles various audio conditions, accents, and technical language without requiring fine-tuning. It runs locally via Python and supports 99 languages.

101,156 stars on GitHub. Last updated 2026-04-15. Licensed MIT.

Use cases

  • Transcribing user audio in applications without cloud API dependency
  • Building multilingual voice interfaces and accessibility features
  • Processing noisy or accented speech in production systems

Pros

  • Multilingual support across 99 languages with robust handling of accents and background noise
  • Runs entirely on-device, no external API calls required
  • Strong community adoption and integration support across frameworks

Cons

  • Slower inference than cloud APIs, requires local compute resources
  • Model size (up to 3GB for largest variant) impacts deployment footprint
  • Accuracy varies by language and audio quality, not optimized for real-time streaming

Indexed from awesome-llmops and enriched against its public facts.

Pros

  • Multilingual support across 99 languages with robust handling of accents and background noise
  • Runs entirely on-device, no external API calls required
  • Strong community adoption and integration support across frameworks

Cons

  • Slower inference than cloud APIs, requires local compute resources
  • Model size (up to 3GB for largest variant) impacts deployment footprint
  • Accuracy varies by language and audio quality, not optimized for real-time streaming

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Used by19entries
M MCP Dev low

eviscerations/whisper-windows-mcp

Various

Windows-native MCP server for local audio transcription — GPU accelerated via Vulkan, works with Claude Desktop

★ 0 updated 11d ago
M MCP Dev low

JuhongPark/mcp-server-pronunciation

Various

Local MCP voice coach with English pronunciation, grammar, and fluency feedback.

★ 0 updated 10d ago
M MCP Dev low

samson-art/transcriptor-mcp

Various

An MCP server (stdio + HTTP/SSE) that fetches video transcripts/subtitles via yt-dlp, with pagination for large responses. Supports YouTube, Twitter/X, Instagram, TikTok, Twitch, V

★ 10 updated 2d ago
M MCP Dev low

transcribe-app/mcp-transcribe

Various

Add transcription tools to your AI-powered assistants.

★ 6 updated 2mo ago
O OSS Orchestration medium

AudioGPT

Community

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

★ 10,179 updated 1y ago
O OSS Obs medium

Off Grid

Community

The Swiss Army Knife of Offline AI. Chat, Speak, and Generate Images - Privacy First, Zero Internet. Download an LLM and use it on your mobile device. No data ever leaves your phon

★ 2,335 updated 5d ago
O OSS Orchestration medium

OpenDAN

Community

OpenDAN is an open source Personal AI OS , which consolidates various AI modules in one place for your personal use.

★ 2,032 updated 2mo ago
O OSS Orchestration medium

Pipecat

Community

Open Source framework for voice and multimodal conversational AI

★ 12,588 updated 2d ago
O OSS Obs medium

whisper-ctranslate2

Community

Whisper command line client compatible with original OpenAI client based on CTranslate2.

★ 1,309 updated 3mo ago
P Apps Productivity one click

Fireflies

Fireflies.ai

AI meeting assistant. Records, transcribes, summarises, and pipes the output to your stack.

P Apps Productivity one click

Granola

Granola

AI notepad for meetings. Take your own notes, Granola enhances them after the call with the audio context.

P Apps Productivity low

Loopin AI

Various

loopinhq.com

P Apps Productivity low

Otter.ai

Various

Otter AI Meeting Agent supports real-time transcription, live chat, automated summaries, insights, and action items.

P Apps Productivity low

PyGPT

Various

PyGPT is an open‑source desktop AI assistant for Windows, macOS and Linux. Chat, agents, web search, run Python, TTS/STT, plugins, long‑term memory.

P Apps Productivity low

Read AI

Various

Read AI, the fastest growing AI meeting assistant, ever, delivers real-time transcription, smart summaries, and enables AI search and discovery across all your content including

P Apps Productivity low

Screenpipe

Various

YC (S26) | AI that knows what you've seen, said, or heard. Records everything you do, say, hear 24/7, local, private, secure

★ 19,049 updated 2d ago
P Apps Productivity low

Teleprompter

Various

An on-device AI for your meetings that listens to you and makes charismatic quote suggestions.

★ 335 updated 3y ago
P Apps Productivity low

Vibe Transcribe

Various

Local-first transcription for audio and video with AI summaries, multilingual support, and privacy-focused processing.

P Apps Productivity low

Wispr Flow

Various

Flow makes writing quick and clear with seamless voice dictation. It is the fastest, smartest way to type with your voice.