Introducing Gemma 4 12B: a unified, encoder-free multimodal model
English summary
Google DeepMind released Gemma 4 12B, a 12-billion-parameter open multimodal model. The model handles text and images without a separate vision encoder through a unified architecture. It is part of the Gemma family of open models. The announcement highlights the encoder-free design but provides no further performance or capability details.
Chinese summary
Google DeepMind 发布了 Gemma 4 12B,这是一个 120 亿参数的开源多模态模型。该模型采用统一架构,无需独立视觉编码器即可处理文本和图像。它属于 Gemma 开源模型系列。公告强调了无编码器设计,但未提供更多性能或能力细节。
Key points
Gemma 4 12B is a 12-billion-parameter multimodal model from Google DeepMind.
Gemma 4 12B 是 Google DeepMind 提供的一个 120 亿参数多模态模型。
It uses an encoder-free architecture, processing text and images in a unified manner.
它采用无编码器架构,以统一方式处理文本和图像。
The model is released as part of the open-source Gemma model series.
该模型作为开源 Gemma 模型系列的一部分发布。