ReposSource: GITHUBJune 3, 2026Importance: 5/5

huggingface/transformers: Release v5.10.1

English summary

This release of Hugging Face Transformers introduces several new models, including Gemma4 Unified (an encoder-free multimodal model), Sapiens2 (a high-resolution vision transformer for human-centric tasks), DeepSeek-OCR-2 (an OCR-specialized vision-language model), and Mellum (a code-focused Mixture-of-Experts language model by JetBrains). It includes breaking changes such as a fix for float16 overflow in the Gemma4 vision pooler and a new base class for audio language models. Numerous bug fixes and improvements address model parallelism, caching, quantization, and distributed training. The release also features contributions from the community, including Romanian documentation translations and key pull requests.

Chinese summary

本次 Hugging Face Transformers 版本引入了多个新模型，包括 Gemma4 Unified（无编码器的多模态模型）、Sapiens2（用于人类中心视觉任务的高分辨率视觉 Transformer）、DeepSeek-OCR-2（专注于 OCR 的视觉语言模型）以及 Mellum（JetBrains 开发的代码专用混合专家语言模型）。包含破坏性变更，如修复 Gemma4 视觉池化器的 float16 溢出问题，并为音频语言模型新增了基类。大量错误修复和改进涉及模型并行、缓存、量化和分布式训练。社区贡献包括罗马尼亚语文档翻译和关键合并请求。

Key points

New models: Gemma4 Unified (encoder-free multimodal), Sapiens2 (human-centric vision), DeepSeek-OCR-2 (OCR + document understanding), and Mellum (code-focused MoE).
新模型：Gemma4 Unified（无编码器多模态）、Sapiens2（人类中心视觉）、DeepSeek-OCR-2（OCR + 文档理解）以及 Mellum（代码专用 MoE）。
Breaking changes: Float16 overflow fix in Gemma4 vision pooler; new base class for audio language models without language modeling head.
破坏性变更：Gemma4 视觉池化器的 float16 溢出修复；音频语言模型新增不带语言建模头的基类。
Parallelization improvements: Tensor parallelism, expert parallelism, beam search fixes; FSDP initialization via from_pretrained.
并行化改进：张量并行、专家并行、束搜索修复；通过 from_pretrained 初始化 FSDP。
Quantization support: DeepGEMM BF16, mixed FP8/FP4, MegaMoE; bug fixes for FP8 MoE and BitsAndBytes quantization.
量化支持：DeepGEMM BF16、混合 FP8/FP4、MegaMoE；FP8 MoE 和 BitsAndBytes 量化错误修复。
Community contributions: Major Romanian documentation translations, significant pull requests by multiple contributors.
社区贡献：主要罗马尼亚语文档翻译，多位贡献者的重要合并请求。

Open original