A Unifying Lens on Supervised Fine-Tuning Through Target Distribution Design
English summary
This paper reinterprets supervised fine-tuning as a target distribution design problem. The Q-target framework decomposes SFT supervision into two choices: how strongly to rely on the observed token and how to allocate remaining probability mass to alternatives. This unifies many existing SFT variants as implicit selections of the target distribution Q. The authors propose Target-SFT, which constructs the training objective directly from the desired target distribution. Across ten reasoning dataset-model combinations, Target-SFT consistently outperforms conventional SFT and other variants, demonstrating a more fundamental SFT design principle.
Chinese summary
该论文将有监督微调重新解释为目标分布设计问题。Q-target框架将SFT监督分解为两个选择:对观察到的token的依赖程度以及剩余概率质量在替代token上的分配。这一视角将许多现有SFT变体统一为对目标分布Q的隐式选择。作者提出Target-SFT方法,直接根据期望的目标分布构建训练目标。在十个推理数据集-模型组合上,Target-SFT一致优于传统SFT和其他变体,揭示了更基本的SFT设计原则。
Key points
SFT is reframed as target distribution design, not merely a loss objective.
将有监督微调重新框架为目标分布设计,而不仅仅是损失函数。
The Q-target framework decomposes SFT into reliance on the observed token and allocation of alternative probability mass.
Q-target框架将SFT分解为对观察token的依赖和替代概率质量的分配。
Existing SFT variants are unified as implicit choices of the target distribution Q.
现有SFT变体被统一为对目标分布Q的隐式选择。
Target-SFT constructs the objective directly from a desired target distribution and consistently outperforms on ten reasoning settings.
Target-SFT直接根据期望目标分布构建目标,在十个推理场景中一致表现更优。