InterleaveThinker:增强智能体交错生成
英文摘要
Existing image generators perform well in single-image generation but struggle with creating interleaved image-text sequences. InterleaveThinker tackles this limitation by introducing a multi-agent pipeline consisting of a planner agent and a critic agent. The planner organizes the input sequence, while the critic evaluates the generator’s interim outputs and refines the instructions, forming an iterative improvement loop. This model-agnostic approach elevates the generation quality of several existing image generators, bringing their performance close to top-tier models. Notably, InterleaveThinker demonstrates large gains on reasoning-oriented benchmarks, highlighting its effectiveness in structured, multi-step generation tasks.
中文摘要
现有图像生成器擅长单张图像生成,但在创作交替图文序列方面存在困难。InterleaveThinker 通过引入由规划智能体和评判智能体组成的多智能体管道来解决这一局限。规划智能体负责组织输入序列,评判智能体则评估生成器的中间输出并精炼指令,形成迭代改进循环。该方法与模型无关,能显著提升多种现有图像生成器的生成质量,使其性能接近顶级模型。尤其在基于推理的基准测试上,InterleaveThinker 展现出大幅提升,凸显了其在结构化多步生成任务中的有效性。
关键要点
InterleaveThinker introduces a multi-agent pipeline with planner and critic agents to enable interleaved image-text generation.
InterleaveThinker 引入规划智能体和评判智能体组成的多智能体管道,以支持交替图文生成。
The method is model-agnostic and substantially improves several existing image generators.
该方法与模型无关,能大幅改进多种现有图像生成器。
Performance reaches levels comparable to top models, with particularly strong gains on reasoning-based benchmarks.
性能达到与顶级模型可比肩的水平,尤其是在基于推理的基准上提升显著。