论文来源: ARXIV2026年6月16日重要度: 4/5

KVEraser: Learning to Steer KV Cache for Efficient Localized Context Erasing

中文标题: KVEraser：学习引导KV缓存实现高效局部上下文擦除

英文摘要

KVEraser is a learned method for post-hoc context erasing in long-context LLMs that avoids full recomputation. It replaces only the KV states of the to-be-erased span with learned steering values while keeping the rest of the cache intact. A two-stage training pipeline first pre-trains on generic span-neighbor suppression, then fine-tunes for downstream tasks. On in-domain tasks with 1K–32K context, KVEraser nearly matches the accuracy of full recomputation but increases latency by only 24% versus a 17.6× increase for full recomputation. The method also generalizes to unseen long-document QA with harmful distractors, achieving the best approximate baseline performance and a 3–4× speedup over full recomputation.

中文摘要

KVEraser 是一种学习型方法，用于长上下文大模型的后验上下文擦除，避免全量重计算。它仅将被擦除片段对应的 KV 状态替换为学习到的引导值，其余缓存保持不变。该方法采用两阶段训练：先通过通用跨度邻域抑制预训练，再针对下游任务微调。在 1K 至 32K 上下文长度的域内任务上，KVEraser 的擦除后性能几乎与全量重计算持平，但延迟仅增加 24%，而全量重计算增加 17.6 倍。该方法还可泛化到未见过的长文档问答任务（含误导性事实干扰项），在近似基线中表现最佳，且比全量重计算快 3–4 倍。

关键要点

Post-hoc erasing over KV cache is hard because deleting a span forces recomputation of all subsequent tokens; KVEraser addresses this by injecting learned steering states only for the erased tokens.
KV 缓存的后验擦除困难，因为删除一个片段需重算后续所有 token；KVEraser 通过仅对被删 token 注入学习到的引导状态来解决该问题。
The method uses a two-stage training pipeline: generic span-neighbor pre-training to suppress erased-span influence, followed by task-specific fine-tuning.
方法采用两阶段训练流程：通用跨度邻域预训练以抑制被擦除片段的影响，然后进行任务特定微调。
On in-domain tasks across 1K–32K context, KVEraser achieves near-identical performance to full recomputation with only a 24% latency overhead (vs 17.6× for full recompute).
在 1K 至 32K 上下文的域内任务上，KVEraser 实现与全量重计算几乎相同的性能，延迟开销仅 24%，而全量重计算为 17.6 倍。
KVEraser generalizes to unseen long-document QA with harmful distractors, outperforming all approximate baselines and delivering 3–4× speedup over full recomputation.
KVEraser 可泛化到未见过的含误导干扰项的长文档问答任务，超过所有近似基线，且比全量重计算快 3–4 倍。

打开原文