MiniMax Sparse Attention (MSA) is a new method for efficient processing of ultra-long contexts (hundreds of thousands to millions of tokens) in large language models. It uses blockwise sparsity and an optimized GPU execution path to achieve significant speedups in both training and inference while maintaining performance. The method is built on Grouped Query Attention (GQA), introducing a lightweight Index Branch for group-specific sparse token retrieval and a Main Branch for exact block-sparse attention. MSA is co-designed with GPU kernels for cross-GPU scalability and has been deployed in a production-grade multimodal model, reducing per-token attention compute. Its inference kernel and model are openly available online.
Existing image generators perform well in single-image generation but struggle with creating interleaved image-text sequences. InterleaveThinker tackles this limitation by introducing a multi-agent pipeline consisting of a planner agent and a critic agent. The planner organizes the input sequence, while the critic evaluates the generator’s interim outputs and refines the instructions, forming an iterative improvement loop. This model-agnostic approach elevates the generation quality of several existing image generators, bringing their performance close to top-tier models. Notably, InterleaveThinker demonstrates large gains on reasoning-oriented benchmarks, highlighting its effectiveness in structured, multi-step generation tasks.
CakewordAI is a new mobile app that enables children to point their device's camera at any object, such as a cup or toy, and instantly learn its name in any language. The app uses on-device AI to segment the object into a sticker, pronounce its name in the target language, and save it to a personal Word Dex. It operates entirely offline with no accounts, no ads, and no data collection, emphasizing privacy. The launch on Product Hunt targets parents seeking a safe, multilingual learning tool for young children.
Kimi AI assistant, listed on Product Hunt, has been updated to version K2.6. The update introduces real-time web search across over 100 websites, analysis of up to 50 files (PDF, DOC, PPT, images), an AI slides and website maker, state-of-the-art coding capabilities, and enhanced image understanding beyond basic text extraction. The features aim to support everyday productivity tasks with multimodal and code-specific improvements.
SocialSource: XImportance: 3/5
MiniMax-M3, an open-weight native multimodal model from MiniMax, is now available on Together AI, the company’s preferred cloud partner. The model features a 1 million token context window, MiniMax Sparse Attention for efficiency, and supports both thinking and non-thinking inference modes. Together AI has optimized inference for MiniMax-M3, achieving up to 125% higher throughput across various concurrency levels, making the model accessible with enhanced performance.
TutorialsSource: SIMON WILLISONImportance: 2/5
Simon Willison's browser-based audio conversation tool, originally built in December 2024 to test the OpenAI WebRTC realtime audio API, has been updated. It now supports the GPT‑Realtime‑2 model, which OpenAI promotes as its first voice model with GPT‑5-class reasoning and a knowledge cutoff of September 30, 2024. A new feature allows users to paste document context, enabling interactive voice Q&A about the provided content. The update makes the newer model available for experimentation while the model has not yet appeared in the ChatGPT iPhone app.