Thinkgap feed

AI signal, minus the noise.

Curated items are read from the processed items table and served as a bilingual feed.

4 items

TELEGRAM HUGGINGFACEPAPERSJun 16, 2026Highlight

JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence

JoyAI-VL-Interaction is an 8B-scale, vision-first model that autonomously decides to respond or delegate without user prompting, aiming to interact with environmental changes like a human would. The system streams ongoing videos for real-time interaction, with pluggable ASR/TTS modules and a background brain. In evaluations, human raters preferred this model over existing video-call assistants across multiple scenarios. The model and system are open-source, representing a new paradigm in interaction modeling for always-on, perceptive agents.

TELEGRAM HUGGINGFACEPAPERSJun 13, 2026Highlight

MiniMax Sparse Attention

MiniMax Sparse Attention (MSA) is a new method for efficient processing of ultra-long contexts (hundreds of thousands to millions of tokens) in large language models. It uses blockwise sparsity and an optimized GPU execution path to achieve significant speedups in both training and inference while maintaining performance. The method is built on Grouped Query Attention (GQA), introducing a lightweight Index Branch for group-specific sparse token retrieval and a Main Branch for exact block-sparse attention. MSA is co-designed with GPU kernels for cross-GPU scalability and has been deployed in a production-grade multimodal model, reducing per-token attention compute. Its inference kernel and model are openly available online.

TELEGRAM HUGGINGFACEPAPERSJun 10, 2026Highlight

SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research

Researchers introduce SearchSwarm-30B-A3B, an agentic LLM designed for long-horizon research tasks. It employs delegation intelligence to decompose complex problems, delegate subtasks to subagents, and integrate summarized results, thereby optimizing the main agent's context budget. Because natural training data is scarce, the team synthesized data and used a harness to guide task decomposition and subagent coordination. The model outperforms similarly sized counterparts and will be open-sourced to foster further investigation.

TELEGRAM HUGGINGFACEPAPERSJun 6, 2026Highlight

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

Code2LoRA is a hypernetwork framework built on Qwen2.5-Coder-32B-Instruct that generates repository-specific LoRA adapters for code language models without adding token overhead at inference. It supports both static adaptation for stable codebases and evolving adaptation for actively changing ones, injecting repository context such as imports, APIs, and project conventions. The method was evaluated on RepoPeftBench, a benchmark of 604 Python repositories, where it achieved high accuracy on both tracks and outperformed traditional fine-tuning approaches. The code, model checkpoints, and datasets are publicly available.