Google Releases Gemini 3.5 Live Translate, a Streaming Speech-to-Speech Audio Model Covering 70+ Languages Across Meet, Translate, and the Live API
English summary
Google announced Gemini 3.5 Live Translate, a dedicated speech-to-speech audio model that continuously translates spoken audio into 70+ languages while preserving the speaker's intonation and pacing. Unlike turn-based agents, it processes audio as a stream, producing translated speech a few seconds behind the speaker. Developers can configure it via the Gemini Live API using a translationConfig with a BCP-47 target language code; the model accepts only raw 16-bit 16kHz PCM audio input and outputs 24kHz audio. It is rolling out in public preview on the Live API and Google AI Studio, a private preview in Google Meet (expanding from 5 to 70+ languages), and will launch in the Google Translate app on Android and iOS. All generated audio is watermarked with SynthID for detectability.
Chinese summary
谷歌发布了 Gemini 3.5 Live Translate,一个专用的语音到语音音频模型,能实时将口语翻译成 70 多种语言,并保留说话人的语调、语速和音高。它采用连续流处理,翻译延迟仅几秒,不同于基于轮次的交互模式。开发者可通过 Gemini Live API 配置 translationConfig,指定 BCP-47 目标语言代码;输入为 16kHz 16-bit 单声道 PCM 音频,输出 24kHz 音频。该模型已在 Live API 和 AI Studio 上开放公开预览,Google Meet 正进行企业私有预览(语言支持从 5 种提升至 70 种以上),并将登陆 Android 和 iOS 版 Google 翻译应用。所有生成音频均嵌入不可察觉的 SynthID 水印。
Key points
New streaming translation model: Gemini 3.5 Live Translate (gemini-3.5-live-translate-preview) is a single audio model, not a chat assistant, optimized for continuous speech-to-speech translation.
新流式翻译模型:Gemini 3.5 Live Translate(gemini-3.5-live-translate-preview)是专用音频模型,优化了连续语音到语音翻译,非聊天助手。
Supports 70+ languages automatically, with generated speech mirroring the speaker's prosody, staying a few seconds behind in real time.
自动支持 70 多种语言,生成语音保留说话人的韵律特征,实时翻译延迟仅几秒。
Available through Gemini Live API (with translationConfig block), Google Meet private preview (expanding from 5 to 70+ languages), and Google Translate app.
可通过 Gemini Live API(使用 translationConfig 代码块)使用,Google Meet 提供私有预览(从 5 种语言扩展到 70 种以上),并将登陆翻译应用。
Technical constraints: audio only, 16kHz PCM input, 24kHz output, no text inputs, tool use, or system instructions in translation mode.
技术限制:仅限音频、16kHz PCM 输入、24kHz 输出,翻译模式下不支持文本输入、工具使用或系统指令。
All output audio carries an imperceptible SynthID watermark; integration partners include Agora, LiveKit, and Pipecat.
所有输出音频嵌入 SynthID 数字水印;集成合作伙伴包括 Agora、LiveKit、Pipecat 等。