The paper presents Agents-K1, an end-to-end pipeline that transforms raw documents into agent-native scientific knowledge graphs. It combines a multimodal parser using a five-module schema to capture entities, evidence, citations, and typed cross-entity relations from full papers, a 4B information-extraction backbone trained with GRPO under a rule-based reward, and a GraphAnything CLI that unifies web search, multimodal graph retrieval, and cross-document traversal. The authors process 2.46 million scientific papers across six subjects to construct Scholar-KG and release a one-million-paper subset. Experiments show superior performance on scientific information extraction, knowledge graph construction, and multi-hop scientific reasoning. The pipeline is extensible to general-domain corpora and schema-conformant data synthesis.
PapersSource: ARXIVImportance: 3/5
Large language model training data curation requires data attribution methods to identify how individual samples influence model outputs. Traditional influence functions, though effective, are too slow and memory-intensive for large-scale use. The paper proposes Influcoder, which distills gradient influence rankings from decoder models into a dedicated encoder. This yields a quick, cost-effective approach to influence-based data attribution at scale.
PapersSource: ARXIVImportance: 4/5
Current tool-augmented LLM agents suffer from an execution-granularity mismatch, as step-wise atomic tool calls expose low-level dataflow and waste context windows. HyperTool proposes a unified MCP-style tool interface where the agent invokes a code block that internally calls multiple tools, manipulates returned values, and passes intermediate results locally, collapsing deterministic subroutines into a single model-visible call. The system is trained on synthesized trajectories from cross-tool compositional tasks and verified in real MCP environments. On the MCP-Universe benchmark, HyperTool raises average accuracy from 15.69% to 35.29% on Qwen3-32B and from 9.93% to 33.33% on Qwen3-8B, outperforming GPT-OSS and Kimi-k2.5. The results show that moving beyond step-wise tool calls significantly improves multi-step tool use in agents.
PapersSource: ARXIVImportance: 4/5
The paper presents EurekAgent, an environment-engineered agent system for metric-driven autonomous scientific discovery. It argues the key bottleneck has shifted from designing agent workflows to engineering agent environments that amplify productive behaviors and suppress harmful ones. EurekAgent engineers environments across four dimensions: permissions engineering for bounded execution and isolated evaluation, artifact engineering for filesystem and Git-based collaboration, budget engineering for budget-aware exploration, and human-in-the-loop engineering for easy oversight. The system achieves new state-of-the-art results on mathematics, kernel engineering, and machine learning tasks, including a novel 26-circle packing solution discovered with under $11 total API cost. Code and results are open-sourced, and the authors call for environment engineering as a core research direction for reliable autonomous research agents.
PapersSource: ARXIVImportance: 3/5
This paper analyzes three frameworks—Tri-System Theory, Thinkframes, and System 0—for understanding AI's cognitive and epistemic impact. It argues that System 0 holds a distinct theoretical position that the other two cannot fully capture. The authors introduce the concept of 'cognitive colonization' to describe how AI systems embed external interests into the architecture of the self in ways that are invisible to users. They highlight that such systems are already widely deployed, making the study of these hidden influences an urgent philosophical and practical concern.
PapersSource: ARXIVImportance: 3/5
The paper analyzes on-policy distillation (OPD), a post-training method combining on-policy student trajectories and dense teacher supervision. The study finds that OPD-style updates are small and coordinate-sparse, distributed across layers and FFN-heavy. Training only the discovered sparse subnetwork recovers nearly full OPD performance, but the sparsity-inducing SGD optimizer underperforms AdamW because dense supervision preserves heterogeneous gradient scales that benefit from adaptive scaling. Geometrically, the updates are numerically full-rank but spectrally concentrated, lying away from the principal singular subspaces of the source weights and disproportionately on coordinates where source weights are near zero. The results show that OPD retains geometric signatures of on-policy post-training rather than behaving as dense parameter rewriting.