vLLM v0.23.0 brings 408 commits from 200 contributors and deepens support for recent models. DeepSeek-V4 received massive hardening with sparse MLA decoupling, TRTLLM-gen attention, EPLB mega-MoE, and sliding-window KV cache retention. Model Runner V2 is now default for Llama and Mistral dense models and adds FlashInfer sampling, breakable CUDA graphs, and pipeline-parallel bubble elimination. The Rust frontend gained streaming generate, dynamic LoRA endpoints, /version and /server_info, plus new tool parsers for InternLM2, Phi-4-mini, and Gemma4. Newly supported models include Gemma 4 Unified (encoder-free), MiMo-V2.5, Step-3.7-Flash, Cosmos3 Reasoner, and Cohere Mini Code. The release also deprecates Transformers v4, unifies reasoning/tool-call parsing, and introduces a multi-tier KV cache offloading framework with an object-store secondary tier.
ReposSource: GITHUBImportance: 3/5
MoneyPrinterTurbo is an open-source tool that leverages AI large language models to automatically generate high-definition short videos with a single click. It abstracts the entire video creation pipeline, enabling users to produce content without manual editing or scripting. The repository provides a straightforward interface for rapid video production, targeting content creators and marketers. The project is available on GitHub under the harry0703 account.
ReposSource: GITHUBImportance: 3/5
Ollama v0.30.4 introduces support for NVIDIA Nemotron 3 Ultra model optimized for high-throughput reasoning and long-running agent workflows. It fixes multimodal models not using GPU on llama.cpp backend, now utilizing Metal GPU offload on Apple Silicon for improved performance. The update also includes new experimental flags for model creation, cleanup scripts for Codex and Pi configurations, and a known issue where gemma4:12b crashes with a floating point exception.
This release updates ollama to v0.30.5-rc0, bumping the underlying llama.cpp version to b9509. The key change is a fix for the Gemma 4 12B multimodal projector that caused a divide-by-zero crash on x86, CUDA, Linux, and Windows systems. This update resolves several reported issues (e.g., #16479, #16489). Users running Gemma 4 models should upgrade to avoid this crash.
This release of Hugging Face Transformers introduces several new models, including Gemma4 Unified (an encoder-free multimodal model), Sapiens2 (a high-resolution vision transformer for human-centric tasks), DeepSeek-OCR-2 (an OCR-specialized vision-language model), and Mellum (a code-focused Mixture-of-Experts language model by JetBrains). It includes breaking changes such as a fix for float16 overflow in the Gemma4 vision pooler and a new base class for audio language models. Numerous bug fixes and improvements address model parallelism, caching, quantization, and distributed training. The release also features contributions from the community, including Romanian documentation translations and key pull requests.