KVEraser: Learning to Steer KV Cache for Efficient Localized Context Erasing
中文标题: KVEraser:学习引导KV缓存实现高效局部上下文擦除
英文摘要
KVEraser is a learned method for post-hoc context erasing in long-context LLMs that avoids full recomputation. It replaces only the KV states of the to-be-erased span with learned steering values while keeping the rest of the cache intact. A two-stage training pipeline first pre-trains on generic span-neighbor suppression, then fine-tunes for downstream tasks. On in-domain tasks with 1K–32K context, KVEraser nearly matches the accuracy of full recomputation but increases latency by only 24% versus a 17.6× increase for full recomputation. The method also generalizes to unseen long-document QA with harmful distractors, achieving the best approximate baseline performance and a 3–4× speedup over full recomputation.
中文摘要
KVEraser 是一种学习型方法,用于长上下文大模型的后验上下文擦除,避免全量重计算。它仅将被擦除片段对应的 KV 状态替换为学习到的引导值,其余缓存保持不变。该方法采用两阶段训练:先通过通用跨度邻域抑制预训练,再针对下游任务微调。在 1K 至 32K 上下文长度的域内任务上,KVEraser 的擦除后性能几乎与全量重计算持平,但延迟仅增加 24%,而全量重计算增加 17.6 倍。该方法还可泛化到未见过的长文档问答任务(含误导性事实干扰项),在近似基线中表现最佳,且比全量重计算快 3–4 倍。
关键要点
Post-hoc erasing over KV cache is hard because deleting a span forces recomputation of all subsequent tokens; KVEraser addresses this by injecting learned steering states only for the erased tokens.
KV 缓存的后验擦除困难,因为删除一个片段需重算后续所有 token;KVEraser 通过仅对被删 token 注入学习到的引导状态来解决该问题。
The method uses a two-stage training pipeline: generic span-neighbor pre-training to suppress erased-span influence, followed by task-specific fine-tuning.
方法采用两阶段训练流程:通用跨度邻域预训练以抑制被擦除片段的影响,然后进行任务特定微调。
On in-domain tasks across 1K–32K context, KVEraser achieves near-identical performance to full recomputation with only a 24% latency overhead (vs 17.6× for full recompute).
在 1K 至 32K 上下文的域内任务上,KVEraser 实现与全量重计算几乎相同的性能,延迟开销仅 24%,而全量重计算为 17.6 倍。
KVEraser generalizes to unseen long-document QA with harmful distractors, outperforming all approximate baselines and delivering 3–4× speedup over full recomputation.
KVEraser 可泛化到未见过的含误导干扰项的长文档问答任务,超过所有近似基线,且比全量重计算快 3–4 倍。