Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation
English summary
This paper introduces a reinforcement learning method that trains large language models to translate previously unseen languages by leveraging contextual linguistic knowledge rather than memorization. Prior approaches, such as continued pretraining or incorporating grammar books, led to overfitting and limited transfer. By optimizing for a surface-level translation metric as a reward, RL-trained models surpass in-context learning and supervised fine-tuning baselines. The results indicate that RL can cultivate meta-learning abilities for extremely low-resource translation, extending its utility beyond traditional reasoning tasks.
Chinese summary
本文提出一种强化学习方法,通过让大语言模型利用上下文语言知识而非记忆来翻译未见过语言。此前的方法如持续预训练或注入语法书易过拟合且迁移能力差。以表面翻译质量指标为奖励,强化学习训练的模型表现优于上下文学习和监督微调。这表明强化学习能培养低资源翻译的元学习能力,将其应用从推理扩展到语言翻译。
Key points
Reinforcement learning enables LLMs to translate unseen languages by learning to leverage contextual linguistic cues, not by memorizing language-specific data.
强化学习使大语言模型能够通过学习利用上下文语言线索来翻译未见过语言,而非记忆语言特定数据。
Compared to continued pretraining or explicit grammar encoding, RL avoids overfitting and improves generalization to extremely low-resource languages.
与持续预训练或显式语法编码相比,强化学习避免了过拟合,并提高了对极低资源语言的泛化能力。
Using a surface-level translation metric as reward yields better translation performance than in-context learning or supervised fine-tuning baselines.
以表面翻译指标作为奖励,模型翻译表现优于上下文学习和监督微调基线。
The work shows RL can be extended from reasoning to language translation, fostering meta-skills for context-based learning.
该研究表明强化学习可从推理任务扩展到语言翻译,促进基于上下文的元学习技能。