whisper
by Community
Robust Speech Recognition via Large-Scale Weak Supervision
OSS
whisper
Added 1 June 2026
Overview
Open-source speech-to-text model trained on 680,000 hours of multilingual audio data from the web. Whisper handles various audio conditions, accents, and technical language without requiring fine-tuning. It runs locally via Python and supports 99 languages.
Best for
Best for
Developers building privacy-first or offline-capable voice features with multilingual requirements
Use cases
- Transcribing user audio in applications without cloud API dependency
- Building multilingual voice interfaces and accessibility features
- Processing noisy or accented speech in production systems
Notes
Open-source speech-to-text model trained on 680,000 hours of multilingual audio data from the web. Whisper handles various audio conditions, accents, and technical language without requiring fine-tuning. It runs locally via Python and supports 99 languages.
101,156 stars on GitHub. Last updated 2026-04-15. Licensed MIT.
Use cases
- Transcribing user audio in applications without cloud API dependency
- Building multilingual voice interfaces and accessibility features
- Processing noisy or accented speech in production systems
Pros
- Multilingual support across 99 languages with robust handling of accents and background noise
- Runs entirely on-device, no external API calls required
- Strong community adoption and integration support across frameworks
Cons
- Slower inference than cloud APIs, requires local compute resources
- Model size (up to 3GB for largest variant) impacts deployment footprint
- Accuracy varies by language and audio quality, not optimized for real-time streaming
Indexed from awesome-llmops and enriched against its public facts.
Pros
- Multilingual support across 99 languages with robust handling of accents and background noise
- Runs entirely on-device, no external API calls required
- Strong community adoption and integration support across frameworks
Cons
- Slower inference than cloud APIs, requires local compute resources
- Model size (up to 3GB for largest variant) impacts deployment footprint
- Accuracy varies by language and audio quality, not optimized for real-time streaming
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.
eviscerations/whisper-windows-mcp
Various
Windows-native MCP server for local audio transcription — GPU accelerated via Vulkan, works with Claude Desktop
JuhongPark/mcp-server-pronunciation
Various
Local MCP voice coach with English pronunciation, grammar, and fluency feedback.
samson-art/transcriptor-mcp
Various
An MCP server (stdio + HTTP/SSE) that fetches video transcripts/subtitles via yt-dlp, with pagination for large responses. Supports YouTube, Twitter/X, Instagram, TikTok, Twitch, V
transcribe-app/mcp-transcribe
Various
Add transcription tools to your AI-powered assistants.
AudioGPT
Community
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Off Grid
Community
The Swiss Army Knife of Offline AI. Chat, Speak, and Generate Images - Privacy First, Zero Internet. Download an LLM and use it on your mobile device. No data ever leaves your phon
OpenDAN
Community
OpenDAN is an open source Personal AI OS , which consolidates various AI modules in one place for your personal use.
Pipecat
Community
Open Source framework for voice and multimodal conversational AI
whisper-ctranslate2
Community
Whisper command line client compatible with original OpenAI client based on CTranslate2.
Fireflies
Fireflies.ai
AI meeting assistant. Records, transcribes, summarises, and pipes the output to your stack.
Granola
Granola
AI notepad for meetings. Take your own notes, Granola enhances them after the call with the audio context.
Loopin AI
Various
loopinhq.com
Otter.ai
Various
Otter AI Meeting Agent supports real-time transcription, live chat, automated summaries, insights, and action items.
PyGPT
Various
PyGPT is an open‑source desktop AI assistant for Windows, macOS and Linux. Chat, agents, web search, run Python, TTS/STT, plugins, long‑term memory.
Read AI
Various
Read AI, the fastest growing AI meeting assistant, ever, delivers real-time transcription, smart summaries, and enables AI search and discovery across all your content including
Screenpipe
Various
YC (S26) | AI that knows what you've seen, said, or heard. Records everything you do, say, hear 24/7, local, private, secure
Teleprompter
Various
An on-device AI for your meetings that listens to you and makes charismatic quote suggestions.
Vibe Transcribe
Various
Local-first transcription for audio and video with AI summaries, multilingual support, and privacy-focused processing.
Wispr Flow
Various
Flow makes writing quick and clear with seamless voice dictation. It is the fastest, smartest way to type with your voice.