🔬Scaling Past Informal AI - Carina Hong, Axiom Math
English summary
In 2025, Axiom achieved a perfect 12/12 on the Putnam exam, surpassing top undergraduates and other AI systems. The startup's approach, Verified AI, uses formal verification with Lean to provide stronger reward signals for reinforcement learning. Axiom's open-source toolkit AXLE enables interactive Lean applications. Their code generation benchmark (Verina) achieved 99% success, far exceeding OpenAI o3's 4.9%. CEO Carina Hong argues that verified generation is essential for AGI.
Chinese summary
2025年,Axiom在普特南数学竞赛中获得满分12/12,超过了顶级本科生和其他AI系统。该创业公司采用“验证AI”方法,利用Lean形式化验证为强化学习提供更强的奖励信号。Axiom开源了AXLE工具包,支持交互式Lean应用。他们在Verina代码生成基准测试中达到了99%的成功率,远超OpenAI o3的4.9%。CEO Carina Hong认为验证生成是实现AGI的必要条件。
Key points
Axiom solved all 12 Putnam problems, scoring 8/12 within time limit, final 12/12.
Axiom解决了所有12道普特南题目,在时限内获得8/12,最终满分12/12。
Formal verification using Lean provides a stronger RL signal than statistical methods.
使用Lean的形式化验证比统计方法提供更强的强化学习信号。
Axiom's AXLE open-source toolkit for Lean proofs.
Axiom开源的AXLE工具包用于Lean证明。
Verina benchmark: Axiom 99%, OpenAI o3 4.9%.
Verina基准测试:Axiom 99%,OpenAI o3 4.9%。
Carina Hong believes verification is key to AGI scaling.
Carina Hong认为验证是AGI扩展的关键。