使用GEPA构建反思性提示优化：多组件提示、结构化反馈和保留验证

Loading / 加载中

英文摘要

This tutorial demonstrates how to use the GEPA framework for reflective prompt optimization on arithmetic word problems. It covers setting up a deterministic benchmark, defining a structured evaluator with scoring and feedback, and evolving multi-component prompts (instructions and format rules) using a reflection model. The process begins with a weak seed prompt and iteratively improves it based on actionable feedback. The optimized prompt is compared on a held-out validation set to assess generalization. The tutorial provides a complete workflow with code, highlighting the shift from manual trial and error to automated prompt evolution.

中文摘要

本教程演示了如何使用GEPA框架对算术应用题进行反思性提示优化。内容包括建立确定性的基准测试、定义带有评分和反馈的结构化评估器，以及使用反思模型进化多组件提示（指令和格式规则）。过程从一个弱的初始提示开始，基于可操作的反馈迭代改进。优化后的提示在保留验证集上进行比较，以评估泛化能力。教程提供了完整的工作流程和代码，突出了从手动试错到自动提示进化的转变。

关键要点

Install GEPA and LiteLLM, configure task and reflection models.

安装GEPA和LiteLLM，配置任务模型和反思模型。

Create a deterministic arithmetic benchmark dataset with 18 problems divided into train and validation sets.

创建一个包含18个问题的确定性算术基准数据集，分为训练集和验证集。

Define an evaluator that scores model outputs based on correctness and format, providing structured feedback.

定义评估器，根据正确性和格式对模型输出进行评分，并提供结构化反馈。

Run GEPA optimization to evolve multi-component prompts (instructions and format rules) from a weak seed.

运行GEPA优化，从弱初始提示进化多组件提示（指令和格式规则）。

Compare baseline and optimized prompts on the held-out validation set to evaluate generalization.

在保留验证集上比较基线和优化后的提示，以评估泛化能力。