One Polluted Page Is Enough: Evaluating Web Content Pollution in Generative Recommenders
English summary
The paper introduces FORGE, a benchmark that measures how often search-augmented LLMs recommend fake products when retrieved web pages are polluted. FORGE rewrites real product descriptions into fake ones across 225 products, 15 categories, and 5 consumer scenarios, then tests 12 commercial and open-weights LLMs. A single polluted page causes fooled recommendation rates up to 27%, and replacing the top-3 search results raises the rate to 73.8%. Vulnerability varies by category, with less familiar products more easily exploited, and reasoning models sometimes worsen the problem by fabricating social proof. Three defenses are evaluated—skepticism prompting, consensus filtering over model priors, and cross-document evidence—but skepticism can backfire and filtering may suppress legitimate recommendations.
Chinese summary
该论文提出基准测试FORGE,测量当检索网页被污染时,搜索增强型大语言模型推荐虚假产品的频率。FORGE将225个真实产品、15个类别、5种消费场景中的描述改写为虚假内容,并测试了12个商业和开源模型。单个污染页面导致模型被欺骗推荐虚假产品的概率最高达27%,而污染前三个搜索结果则使该概率升至73.8%。漏洞随产品类别而异,对模型先验知识较弱的产品更易被利用,且推理模型有时会编造社会证明,加剧虚假推荐。论文评估了三种防御措施——怀疑提示、基于模型先验的共识过滤和跨文档证据过滤,但怀疑可能适得其反,过滤则可能抑制真实产品推荐。
Key points
FORGE benchmark simulates web pollution by rewriting product descriptions in retrieved pages, covering 225 products across 15 categories.
FORGE基准通过改写检索页面中的产品描述来模拟网页污染,覆盖15个类别共225个产品。
A single polluted page can fool LLMs into recommending a fake product up to 27% of the time, and polluting the top-3 results raises the rate to 73.8%.
单个被污染页面可使LLM以高达27%的概率推荐虚假产品,污染前三个搜索结果则使概率升至73.8%。
Vulnerability is category-dependent; models lacking stable prior knowledge about the product category are more easily exploited.
漏洞具有类别依赖性,当模型缺乏对产品类别的稳定先验知识时,更容易被利用。
Reasoning models do not mitigate the risk and may generate spurious social proof to justify fake recommendations.
推理模型不能缓解风险,反而可能生成虚假的社会证明来为虚假推荐提供合理性。
Skepticism prompting can exacerbate vulnerability, while consensus filtering over priors or cross-document evidence risks suppressing legitimate products.
怀疑提示可能加剧漏洞,而基于先验或跨文档证据的共识过滤则可能抑制真实产品。