Characterizing Cultural Localization in AI-Generated Stories
中文标题: AI生成故事中的文化本地化特征分析
英文摘要
A new method is proposed to measure the degree of templated versus holistic cultural localization in AI-generated stories by identifying lexical tokens that distinguish narratives across nationalities and then measuring narrative similarity after their removal. Evaluating stories from five models across 125 topics and 193 nationalities, the method finds that only 9–17% of the vocabulary accounts for cross-national variation, with the remaining text exhibiting repeated multi-word sequences, indicating a shared culturally-agnostic template. The study further characterizes the identified cultural markers for stereotypicality and offensiveness, revealing that markers from 19 countries, predominantly in the Global South, are on average offensive.
中文摘要
提出了一种新方法,通过识别区分不同国籍故事的词汇标记并测量移除这些标记后叙事的相似性,来量化AI生成故事中模板化文化本地化与整体文化本地化的程度。对5个模型在125个主题和193个国籍下生成的故事进行评估,发现仅9%至17%的词汇决定了跨国差异,剩余文本包含重复的多词序列,表明存在一个共享的文化无涉叙事模板。研究进一步分析了所识别文化标记的刻板印象性和冒犯性,发现来自19个国家的标记(主要位于全球南方)平均具有冒犯性。
关键要点
Proposes a measurement method that distinguishes between templated (marker-only) and holistic (plot/value-level) cultural localization in AI-generated stories.
提出了一种区分AI生成故事中模板化(仅标记)与整体(情节/价值观层面)文化本地化的测量方法。
Analyzing 5 models, 125 topics, and 193 nationalities, the method detects that only 9–17% of the vocabulary acts as cultural markers driving narrative variation.
分析5个模型、125个主题和193个国籍,该方法检测到仅9%至17%的词汇作为文化标记驱动了叙事差异。
After removing those markers, the remaining narratives contain repeated multi-word sequences, suggesting a shared, culturally-agnostic template.
移除这些标记后,剩余叙事包含重复的多词序列,表明存在一个共享的、文化无涉的模板。
Cultural markers for 19 countries, mostly from the Global South, are characterized as offensive on average.
来自19个国家的文化标记(主要位于全球南方)平均被定性为具有冒犯性。