Google Releases DiffusionGemma: Diffusion-Based LLM with 4x Faster Text Generation
English summary
Google has introduced DiffusionGemma, a new language model that applies a diffusion-based decoding process to text generation, claiming up to a 4× speedup over current methods. The model allows parallel token generation, bypassing the sequential limitations of autoregressive decoding. Local deployment instructions are provided via the Unsloth platform. The existing V2EX Chat service already uses a Gemma 4 26B model, though not necessarily DiffusionGemma.
Chinese summary
Google推出了DiffusionGemma,一种将扩散解码应用于文本生成的新模型,宣称生成速度可达到现有方法的4倍。该模型支持并行生成token,克服了自回归解码的顺序限制。通过Unsloth平台提供了本地部署的步骤说明。目前V2EX Chat已在使用Gemma 4 26B模型(但并非一定为DiffusionGemma)。
Key points
DiffusionGemma uses a diffusion-based approach to generate text, enabling parallel decoding and up to 4× faster generation than standard autoregressive models.
DiffusionGemma采用基于扩散的方法生成文本,支持并行解码,生成速度可达标准自回归模型的4倍。
Unsloth offers a dedicated guide for running DiffusionGemma locally, simplifying local experimentation.
Unsloth提供了专门的DiffusionGemma本地运行指南,方便本地实验。