Entropy Is Not Enough: Unlocking Effective Reinforcement Learning for Visual Reasoning via Vision-Anchored Token Selection
The paper argues that entropy-based token selection in reinforcement learning for visual reasoning is insufficient because it misses critical contextual visual cues. The authors propose vision-anchored token selection, which forces the agent to prioritize task-relevant visual features during decision-making. Experimental results demonstrate that this method yields more robust and interpretable performance on visual reasoning tasks compared to entropy-driven baselines. The work underscores the need for more sophisticated attention mechanisms to improve RL agents' understanding of visual environments.