Google has released the diffusiongemma-26B-A4B-it model under an Apache 2.0 license, building on its earlier experimental Gemini Diffusion. It is openly available on Hugging Face and NVIDIA offers free access via their NIM cloud API, demonstrating over 500 tokens per second generation speed. In a test, the model generated 2,409 tokens in 4.4 seconds, highlighting its efficiency for text generation tasks.
Version 0.32a3 of the open-source LLM command-line tool llm has been released. The update was almost entirely written by Anthropic's new Claude Fable 5 model. Developer Simon Willison detailed the experience in a separate write-up, highlighting the model's code generation capabilities.
Simon Willison shares a method to add custom pricing for newly released models in AgentsView, a token usage exploration tool by Wes McKinney. Claude Fable 5 was not yet in AgentsView’s default pricing database, so he reverse-engineered the application to manually configure its price. He then used the tool to plot his Claude Fable 5 usage across local projects as a treemap. This is a practical tip for tracking costs when using coding agents with unreleased model prices.