Infogap feed

AI signal, minus the noise.

Curated items are read from the processed items table and served as a bilingual feed.

Page 1 of 7

Filters

PapersSource: ARXIVJun 12, 2026Importance: 4/5

This paper introduces EvoArena, a benchmark that evaluates LLM agents under progressive environmental changes across terminal, software, and social domains. Current agents achieve only 39.6% average accuracy on EvoArena. The authors propose EvoMem, a patch-based memory paradigm that records structured update histories to reason about environmental evolution. EvoMem boosts EvoArena accuracy by 1.5 points, and also improves GAIA and LoCoMo benchmarks by 6.1 and 4.8 percentage points, respectively. On chain-level tasks requiring sequences of related subtasks, EvoMem raises accuracy by 3.7 points. Mechanistic analysis shows EvoMem better preserves complete evolving environment states in memory evidence.

PapersSource: ARXIVJun 12, 2026Importance: 4/5

The paper presents SpatialClaw, a training-free framework that uses code execution as the action interface for agentic spatial reasoning. It maintains a stateful Python kernel pre-loaded with input frames and a suite of perception and geometry primitives, allowing a VLM-backed agent to write one executable cell per step based on all prior outputs. Evaluated on 20 static and dynamic 3D/4D spatial reasoning benchmarks, SpatialClaw achieves an average accuracy of 59.9%, outperforming the prior spatial agent by 11.2 percentage points. The gains are consistent across six vision-language model backbones from two model families, with no benchmark‑ or model‑specific tuning. The results demonstrate that a flexible, iterative code‑based interface significantly outperforms single‑pass or structured tool‑call designs for open‑ended spatial tasks.

PapersSource: ARXIVJun 12, 2026Importance: 4/5

The paper presents Agents-K1, an end-to-end pipeline that transforms raw documents into agent-native scientific knowledge graphs. It combines a multimodal parser using a five-module schema to capture entities, evidence, citations, and typed cross-entity relations from full papers, a 4B information-extraction backbone trained with GRPO under a rule-based reward, and a GraphAnything CLI that unifies web search, multimodal graph retrieval, and cross-document traversal. The authors process 2.46 million scientific papers across six subjects to construct Scholar-KG and release a one-million-paper subset. Experiments show superior performance on scientific information extraction, knowledge graph construction, and multi-hop scientific reasoning. The pipeline is extensible to general-domain corpora and schema-conformant data synthesis.

AI signal, minus the noise.

Filters

HyperTool: Beyond Step-Wise Tool Calls for Tool-Augmented Agents

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

Recursive Agent Harnesses