TutorialsSource: MARKTECHPOSTJune 13, 2026Importance: 4/5

Moonshot AI Releases Kimi K2.7-Code: Open-Weight 1T-Parameter Coding Model with 256K Context and +21.8% on Kimi Code Bench v2 Over K2.6

English summary

Moonshot AI released Kimi K2.7-Code, an open-weight, coding-specialized agentic model under Modified MIT license. It is a Mixture-of-Experts architecture with 1T total parameters, 32B active per token, 384 experts with 8 selected, MLA attention, SwiGLU feed-forward, and a 400M-parameter MoonViT vision encoder. The model supports a 256K-token context window, ships with native INT4 quantization, and enforces mandatory thinking mode with fixed sampling parameters (temperature 1.0, top_p 0.95, n 1). In company-reported benchmarks, K2.7-Code achieves 62.0 on Kimi Code Bench v2 (+21.8% over K2.6), 81.1 on MCP Mark Verified (beating Claude Opus 4.8’s 76.4), and demonstrates approximately 30% lower reasoning-token usage than K2.6, reducing cost and latency in agentic workflows. The 595 GB model weights are available on Hugging Face and can be self-hosted via vLLM, SGLang, or KTransformers; API access uses the kimi-k2.7-code model name with OpenAI-compatible endpoints.

Chinese summary

月之暗面发布了 Kimi K2.7-Code，一款基于 Modified MIT 许可的开源代码专用代理模型。它采用混合专家架构，总参数 1 万亿，每令牌激活 32B，包含 384 个专家（每步选 8+1 共享），使用 MLA 注意力、SwiGLU 前馈网络和一个 400M 参数的 MoonViT 视觉编码器。模型支持 256K 上下文窗口，原生 INT4 量化，强制开启思考模式并固定采样参数（温度 1.0、top_p 0.95、n 1）。公司公布的基准测试显示，K2.7-Code 在 Kimi Code Bench v2 上得分为 62.0（相对 K2.6 提升 21.8%），在 MCP Mark Verified 上得分为 81.1（超过 Claude Opus 4.8 的 76.4），推理 token 消耗比 K2.6 减少约 30%，在代理工作流中降低成本和延迟。模型权重约 595 GB，已发布在 Hugging Face，可通过 vLLM、SGLang 或 KTransformers 自托管；API 使用 kimi-k2.7-code 模型名，兼容 OpenAI 接口。

Key points

Open-weight coding-specialized model under Modified MIT license, with 1T total parameters and 32B active per token.
Modified MIT 许可的开源代码专用模型，总参数 1 万亿，每令牌激活 32B。
Achieves 62.0 on Kimi Code Bench v2, a +21.8% improvement over Kimi K2.6, and beats Claude Opus 4.8 on MCP Mark Verified (81.1 vs 76.4).
在 Kimi Code Bench v2 上得 62.0 分，比 K2.6 提升 21.8%；并在 MCP Mark Verified 上以 81.1 分超过 Claude Opus 4.8（76.4）。
Uses approximately 30% fewer reasoning tokens than K2.6, reducing per-task cost and improving interactive agent speed.
推理 token 消耗比 K2.6 减少约 30%，降低单任务成本并提升交互式代理速度。
Mandatory thinking mode and fixed sampling parameters (temperature 1.0, top_p 0.95, n 1) limit usage flexibility.
强制思考模式且固定采样参数（温度 1.0，top_p 0.95，n 1），使用灵活性受限。
Weighing 595 GB on disk, the model supports self-hosting via vLLM/SGLang/KTransformers; API is OpenAI-compatible at $0.95/$4.00 per 1M input/output tokens.
模型磁盘占用 595 GB，支持通过 vLLM/SGLang/KTransformers 自托管；API 兼容 OpenAI，价格为每百万输入/输出 token 0.95/4.00 美元。

Open original