Meet Harness-1: A 20B Retrieval Subagent Trained With Reinforcement Learning Inside a Stateful Search Harness on gpt-oss-20b
English summary
Harness-1 is a 20B retrieval subagent that separates search decisions from bookkeeping by using a stateful harness. It achieves 0.730 average curated recall across eight benchmarks, outperforming other open models and nearing frontier performance. The model is trained with supervised fine-tuning for interface operation and reinforcement learning for search policy, using a finite set of tools and a working memory. Weights and harness code are publicly released on Hugging Face and GitHub.
Chinese summary
Harness-1是一个200亿参数的检索子代理,通过有状态框架将搜索决策与簿记分离。它在八个基准测试中平均达到0.730的策划召回率,超越其他开放模型,接近前沿性能。该模型通过监督微调学习接口操作,通过强化学习优化搜索策略,使用有限工具集和工作记忆。权重和框架代码已在Hugging Face和GitHub上公开发布。
Key points
Core idea: Stateful cognitive offloading separates policy decisions from environment bookkeeping.
核心思想:有状态认知卸载将策略决策与环境簿记分离。
Training: SFT on 899 GPT-5.4 trajectories, then on-policy CISPO RL on SEC queries with diversity bonus.
训练:在899条GPT-5.4轨迹上进行SFT,然后使用多样性奖励在SEC查询上进行策略内CISPO强化学习。
Performance: 0.730 avg curated recall, +11.4 points over next open subagent; largest gains on held-out benchmarks.
性能:平均策划召回率0.730,比下一个开放子代理高11.4个百分点;在保留基准上收益最大。
Open-source: Weights and harness code available on Hugging Face and GitHub, servable via vLLM, SGLang, Transformers.
开源:权重和框架代码已在Hugging Face和GitHub上发布,可通过vLLM、SGLang、Transformers提供服务。