Moonshot AI Releases Kimi K2.7-Code: Open-Weight 1T-Parameter Coding Model with 256K Context and +21.8% on Kimi Code Bench v2 Over K2.6
English summary
Moonshot AI released Kimi K2.7-Code, an open-weight, coding-specialized agentic model under Modified MIT license. It is a Mixture-of-Experts architecture with 1T total parameters, 32B active per token, 384 experts with 8 selected, MLA attention, SwiGLU feed-forward, and a 400M-parameter MoonViT vision encoder. The model supports a 256K-token context window, ships with native INT4 quantization, and enforces mandatory thinking mode with fixed sampling parameters (temperature 1.0, top_p 0.95, n 1). In company-reported benchmarks, K2.7-Code achieves 62.0 on Kimi Code Bench v2 (+21.8% over K2.6), 81.1 on MCP Mark Verified (beating Claude Opus 4.8’s 76.4), and demonstrates approximately 30% lower reasoning-token usage than K2.6, reducing cost and latency in agentic workflows. The 595 GB model weights are available on Hugging Face and can be self-hosted via vLLM, SGLang, or KTransformers; API access uses the kimi-k2.7-code model name with OpenAI-compatible endpoints.
Chinese summary
月之暗面发布了 Kimi K2.7-Code,一款基于 Modified MIT 许可的开源代码专用代理模型。它采用混合专家架构,总参数 1 万亿,每令牌激活 32B,包含 384 个专家(每步选 8+1 共享),使用 MLA 注意力、SwiGLU 前馈网络和一个 400M 参数的 MoonViT 视觉编码器。模型支持 256K 上下文窗口,原生 INT4 量化,强制开启思考模式并固定采样参数(温度 1.0、top_p 0.95、n 1)。公司公布的基准测试显示,K2.7-Code 在 Kimi Code Bench v2 上得分为 62.0(相对 K2.6 提升 21.8%),在 MCP Mark Verified 上得分为 81.1(超过 Claude Opus 4.8 的 76.4),推理 token 消耗比 K2.6 减少约 30%,在代理工作流中降低成本和延迟。模型权重约 595 GB,已发布在 Hugging Face,可通过 vLLM、SGLang 或 KTransformers 自托管;API 使用 kimi-k2.7-code 模型名,兼容 OpenAI 接口。
Key points
Open-weight coding-specialized model under Modified MIT license, with 1T total parameters and 32B active per token.
Modified MIT 许可的开源代码专用模型,总参数 1 万亿,每令牌激活 32B。
Achieves 62.0 on Kimi Code Bench v2, a +21.8% improvement over Kimi K2.6, and beats Claude Opus 4.8 on MCP Mark Verified (81.1 vs 76.4).
在 Kimi Code Bench v2 上得 62.0 分,比 K2.6 提升 21.8%;并在 MCP Mark Verified 上以 81.1 分超过 Claude Opus 4.8(76.4)。
Uses approximately 30% fewer reasoning tokens than K2.6, reducing per-task cost and improving interactive agent speed.
推理 token 消耗比 K2.6 减少约 30%,降低单任务成本并提升交互式代理速度。
Mandatory thinking mode and fixed sampling parameters (temperature 1.0, top_p 0.95, n 1) limit usage flexibility.
强制思考模式且固定采样参数(温度 1.0,top_p 0.95,n 1),使用灵活性受限。
Weighing 595 GB on disk, the model supports self-hosting via vLLM/SGLang/KTransformers; API is OpenAI-compatible at $0.95/$4.00 per 1M input/output tokens.
模型磁盘占用 595 GB,支持通过 vLLM/SGLang/KTransformers 自托管;API 兼容 OpenAI,价格为每百万输入/输出 token 0.95/4.00 美元。