Multi-Source Cybersecurity Logs: An ATT&CK-Labeled Dataset and SLM Evaluation
中文标题: 多源网络安全日志:带ATT&CK标签的数据集与小语言模型评估
英文摘要
The paper introduces a new dataset that combines system, network, and browser activity logs, containing 2.3 million events from 870 sessions (70 attack, 800 benign). All malicious events are labeled with MITRE ATT&CK technique IDs, covering 12 tactics and 53 techniques, and attacks were generated using real tools including RAT, C2 tunnels, and cloud exfiltration. The authors fine-tuned three Small Language Models (Qwen2.5-1.5B, Llama-3.2-3B, Phi-4-Mini) with LoRA and evaluated them on chunk classification and ATT&CK technique identification. Fine-tuning raised chunk classification accuracy from ~8% for base models to 90–97%. Technique identification remained hard, with the best exact-match accuracy at 42%, though high partial-match scores indicate the models learned the underlying reasoning.
中文摘要
该论文发布了一个新数据集,整合了系统、网络和浏览器日志,包含870个会话(70个攻击,800个正常)约230万条事件。所有恶意事件均标注了MITRE ATT&CK技术ID,覆盖12类战术、53项技术,攻击数据使用真实的远程访问木马、C2隧道和云外泄工具生成。作者用LoRA微调了三个小语言模型(Qwen2.5-1.5B、Llama-3.2-3B、Phi-4-Mini),并在日志块分类和ATT&CK技术识别任务上评估。微调使块分类准确率从基线的约8%升至90–97%;技术识别仍具挑战,最佳精确匹配仅42%,但高部分匹配分数表明模型捕获了大部分推理过程。
关键要点
Created a novel multi-source log dataset with 2.3M events and per-event ATT&CK technique labels (12 tactics, 53 techniques) from real attack tools.
构建了包含230万事件、带有逐事件ATT&CK技术标签(12类战术、53项技术)的新多源日志数据集,攻击数据来自真实工具。
Fine-tuned Qwen2.5-1.5B, Llama-3.2-3B, and Phi-4-Mini with LoRA, improving chunk classification accuracy from ~8% to 90–97%.
用LoRA微调Qwen2.5-1.5B、Llama-3.2-3B和Phi-4-Mini,将日志块分类准确率从约8%提升至90–97%。
ATT&CK technique identification reached 42% exact-match at best, but high partial-match scores show models captured reasoning.
ATT&CK技术识别最佳精确匹配仅42%,但高部分匹配分数显示模型掌握了推理逻辑。
Addresses the gap of no public dataset combining system, network, and browser logs with granular ATT&CK labels.
填补了无公开数据集同时整合系统、网络和浏览器日志并带有细粒度ATT&CK标签的空白。