EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments
English summary
The paper introduces EvoArena, a benchmark designed to simulate real-world dynamic changes for LLM agents, and EvoMem, a memory paradigm that models progressive updates and structured memory evolution. Current LLM agents show significant difficulty on EvoArena's evolving tasks. EvoMem consistently improves agent performance on EvoArena and also increases accuracy on existing benchmarks like GAIA and LoCoMo. By recording memory evolution and update histories, EvoMem enables better reasoning about environmental shifts. The work demonstrates the importance of incorporating evolution modeling into both evaluation and memory for effective agent deployment.
Chinese summary
该论文提出EvoArena基准,模拟真实世界动态变化以测试LLM智能体,并推出EvoMem记忆范式,对渐进式更新和结构化记忆演化进行建模。现有的LLM智能体在EvoArena不断变化的任务上表现挣扎。EvoMem能持续提升在EvoArena上的性能,并在GAIA和LoCoMo等标准基准上提高准确率。通过记录记忆演化和更新历史,EvoMem使智能体更好地推理环境变化。该工作证明了将演化建模融入评估和记忆对于有效部署智能体的重要性。
Key points
EvoArena is a new benchmark that models real-world dynamic, evolving environments for LLM agents.
EvoArena是一个为LLM智能体建模真实世界动态演化环境的新基准。
Current state-of-the-art LLM agents perform poorly on EvoArena tasks, revealing a gap in handling progressive changes.
当前最优的LLM智能体在EvoArena任务上表现不佳,暴露了处理渐进变化的能力缺陷。
EvoMem is a memory paradigm that explicitly tracks memory evolution and update histories, enabling agents to reason about environmental changes.
EvoMem是一种记忆范式,显式追踪记忆演化和更新历史,使智能体能够推理环境变化。
EvoMem improves agent performance on EvoArena and boosts accuracy on benchmark datasets GAIA and LoCoMo.
EvoMem提升了智能体在EvoArena上的表现,并提高了在GAIA和LoCoMo基准数据集上的准确率。
The work highlights the necessity of evolution-aware evaluation and memory for building robust, practical LLM agents.
该工作强调了演化感知的评估和记忆对于构建鲁棒实用的LLM智能体的必要性。