Which Models Are Our Models Built On? Auditing Invisible Dependencies in Modern LLMs

Loading / 加载中

English summary

The paper presents ModSleuth, an agentic system that recursively reconstructs LLM dependency graphs from public artifacts like model cards, datasets, and evaluation reports. It addresses challenges of defining and reconciling dependencies by formalizing direct vs. indirect relations and resolving artifact identities across inconsistent documentation. Applied to four public LLM releases, ModSleuth recovered 1,060 source-verified dependencies, revealing multi-hop license obligations, train-evaluation coupling, and discrepancies between released and training-time artifacts. The system and dependency graphs are released to enable transparent analysis of increasingly complex LLM development ecosystems.

Chinese summary

该论文提出ModSleuth，一个从模型卡、数据集和评估报告等公开制品中递归重建LLM依赖图的智能系统。它通过形式化直接与间接依赖关系，并解决跨不一致文档的制品身份对齐问题，应对依赖定义和协调的挑战。在四个公开LLM版本上应用，ModSleuth恢复了1060个经来源验证的依赖，揭示了多跳许可义务、训练-评估耦合、以及发布与训练阶段制品的差异。系统及依赖图公开释放，以支持日益复杂的LLM开发生态系统的透明分析。

Key points

ModSleuth is an agentic system that recursively reconstructs LLM dependency graphs from heterogeneous public artifacts.

ModSleuth是一个从异构公开制品中递归重建LLM依赖图的智能系统。

It formalizes direct and indirect dependencies and resolves artifact identity across names, versions, and repositories.

它形式化了直接和间接依赖关系，并解决了跨名称、版本和仓库的制品身份对齐问题。

Applied to four LLM releases, it recovered 1,060 source-verified dependencies and uncovered hidden multi-hop license obligations and train-evaluation coupling.

在四个LLM版本上应用，恢复了1060个来源验证的依赖，并揭示了隐藏的多跳许可义务和训练-评估耦合。

The system and resulting dependency graphs are publicly released to promote transparency in LLM ecosystems.

该系统及生成的依赖图已公开释放，以促进LLM生态系统的透明度。