Gemini 3.5 Live Translate: Real-Time Speech in 70 Languages

On June 9, 2026, Google released Gemini 3.5 Live Translate — a new audio model that streams speech translation within seconds of the speaker, covering more than 70 languages and 2,000 language combinations. The consumer version landed in Google Translate on Android and iOS the same day. The enterprise preview in Google Meet started rolling out to Workspace plan holders this month.

For anyone building or deploying voice AI, this is significant. Real-time multilingual voice translation has crossed from expensive specialist infrastructure to a standard API call at $0.023 per minute.

What It Actually Does

Unlike older translation systems that wait for a speaker to finish before processing, Gemini 3.5 Live works as speech flows. It stays a few seconds behind the speaker rather than making users wait for a full sentence before hearing anything back.

The model preserves intonation, tempo, and vocal pitch. The translated voice sounds closer to how the original speaker actually sounds, rather than the flat, robotic audio that has defined translation tools until now. Google says the model handles background noise and overlapping voices, which means it works in real-world settings — an open office, a video call with household noise, a busy café — not just controlled environments.

Supported languages include the major European languages, Mandarin, Cantonese, Japanese, Korean, Arabic, Hindi, and dozens more across Asia, Africa, and Latin America. With 2,000 supported language pairs, this covers the vast majority of business communication scenarios.

Where It’s Available

Consumer: Google Translate on Android and iOS, globally, starting June 9. No sign-up required, no additional cost for existing users.

Developers: Public preview via the Gemini Live API and Google AI Studio. Pricing is $0.023 per minute — well below most competing real-time translation services.

Enterprise: Private preview in Google Meet starting this month for Workspace plan customers. Full rollout is scheduled later in 2026. Partners building on the model include Grab, Agora, and LiveKit.

What This Means for Business

International teams no longer need separate tools. The friction of running a real-time translation layer on top of video meetings has been high enough that most teams work around it — hiring bilingual staff, relying on imperfect automated captions, or simply restricting who attends which meetings. A native translation layer in Google Meet removes that friction.

Voice AI goes global without localization overhead. For businesses deploying voice AI agents — customer service lines, internal knowledge tools, automated scheduling — multilingual capability has typically required separate models or significant custom development per language. A single API at $0.023 per minute changes that calculus. A voice agent that handles English, Spanish, and Mandarin without three separate deployments is now a real option at reasonable cost.

The economics shifted further than the headlines suggest. Most enterprise translation tools charge per word or per character and require human review for accuracy. At $0.023 per minute for real-time streaming audio, the cost of multilingual voice capability drops from thousands of dollars per project to cents per conversation.

It raises the floor for what customers expect. If Google Translate on a free app handles 70 languages in real time, business customers will expect that capability as a baseline from any voice AI system. The bar for what counts as a functioning multilingual product just moved.

The Practical Constraint

The Workspace enterprise rollout is still in private preview. For most business teams, this means waiting for general availability rather than deploying it in production today. Developers can access the Gemini Live API now and build on it — but enterprise procurement and compliance reviews take time, and the full rollout timeline is still described as “later in 2026.”

The capability is real. The enterprise timeline is measured in months, not days.

Why It Matters for Voice AI in Particular

The voice AI category has been growing fast, but multilingual capability has been one of the more difficult problems to solve well. Most voice agents work in one language natively and bolt on translation as an afterthought, which degrades quality and adds latency.

What Gemini 3.5 Live Translate demonstrates is that the latency and quality problems are solvable at API level. Preserving vocal pitch and intonation across languages — handling the acoustic characteristics of speech, not just the words — is the kind of technical problem that looked hard three years ago. It’s now a capability you can call via API for less than three cents a minute.

For any business thinking about where multilingual voice AI fits in their operations, the answer in June 2026 is: closer than it was six months ago, and accessible at a price that makes the business case straightforward.

Enterprise DNA helps businesses evaluate and deploy AI capabilities across their operations, including voice AI and multilingual workflows. Book a discovery call with Sam McKay to discuss where live translation fits in your AI strategy.

Source

Google Blog

Enterprise DNA Resources