Thinkgap feed

AI signal, minus the noise.

Curated items are read from the processed items table and served as a bilingual feed.

46 items

TELEGRAM HUGGINGFACEPAPERSJun 16, 2026Highlight

Geometric Action Model for Robot Policy Learning

The paper introduces the Geometric Action Model (GAM), which leverages a pretrained geometric foundation model to enhance language-conditioned manipulation in 3D physical environments. GAM splits the foundation model into an observation encoding layer and a future prediction layer, enabling it to predict future tokens from language, proprioception, and action history before decoding them into actions. This 3D-aware approach significantly improves accuracy, robustness, efficiency, and speed over standard 2D vision-language-action models in both simulated and real-robot contact-rich tasks.

TELEGRAM HUGGINGFACEPAPERSJun 16, 2026Highlight

JoyAI-VL-Interaction: Real-Time Vision-Language Interaction Intelligence

JoyAI-VL-Interaction is an 8B-scale, vision-first model that autonomously decides to respond or delegate without user prompting, aiming to interact with environmental changes like a human would. The system streams ongoing videos for real-time interaction, with pluggable ASR/TTS modules and a background brain. In evaluations, human raters preferred this model over existing video-call assistants across multiple scenarios. The model and system are open-source, representing a new paradigm in interaction modeling for always-on, perceptive agents.

TELEGRAM HUGGINGFACEPAPERSJun 16, 2026

Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories

The paper proposes Data2Story, a multi-agent framework that automates data journalism by mimicking a virtual newsroom with distinct roles. It generates evidence-based news stories in multiple formats, such as text articles, interactive maps, and audio, each linked to data sources for verifiability. In evaluations against expert human journalists, Data2Story showed competitive performance, particularly excelling in transparency and auditability. Human journalists still outperform in editorial angle and creative design. The system is designed as a collaborative tool for journalists, not a replacement.

TELEGRAM HUGGINGFACEPAPERSJun 16, 2026Highlight

FastContext: Training Efficient Repository Explorer for Coding Agents

FastContext is a system that decouples repository exploration from code solving in LLM coding agents to reduce token waste from irrelevant snippets. It deploys specialized exploration models as a dedicated subagent, issuing parallel tool calls and delivering focused context via concise file paths and line ranges. The approach cuts token consumption by up to 60% and improves resolution rates by up to 5.5% relative to baseline agents.

TELEGRAM HUGGINGFACEPAPERSJun 16, 2026

"DreamX-World 1.0: A General-Purpose Interactive World Model" by DreamX Team , Yancheng Bai , Rui Chen , Xiangxiang Chu , Rujing Dang , Hao Dou , Bing...

Processing is temporarily unavailable. The original item should be reviewed from its source link. This fallback keeps the item compatible with the processing contract.

TELEGRAM HUGGINGFACEPAPERSJun 15, 2026Highlight

APPO: Agentic Procedural Policy Optimization

APPO is a new agentic reinforcement learning method that improves multi-turn tool-use in large language model agents. It refines branching and credit assignment by focusing on fine-grained token-level decision points rather than coarse heuristic interaction units. The method selects branching locations using token uncertainty and policy-induced likelihood gains, leading to more precise exploration and better credit distribution across branched rollouts. Experiments across 13 benchmarks show APPO consistently boosts performance over existing agentic RL methods by approximately 4 points. The approach also ensures efficient tool-calls and maintains behavioral interpretability.