Cohere Officially Launches North Mini Code Model with Open Weights and vLLM Support
English summary
Cohere has officially released the North Mini Code model after positive community feedback on an earlier version. The model weights are available on Hugging Face in FP8 format, and it can be tried for free on OpenCode. A technical blog post and announcement provide additional details. Deployment with vLLM requires the main branch and the cohere_melody library (>=0.9.0), with support for tool call parsing, reasoning parsing, and a maximum context length of 320,000 tokens. Community members have already created MLX versions, and Cohere is internally exploring quantization and llama.cpp support.
Chinese summary
Cohere 在社区的积极反馈下正式发布了 North Mini Code 模型。权重以 FP8 格式在 Hugging Face 上开放,并可通过 OpenCode 免费试用。官方技术博客和公告提供了更多详细信息。使用 vLLM 部署需安装主分支和 cohere_melody 库(>=0.9.0),支持工具调用解析、推理解析以及最大上下文长度 320,000 tokens。社区已提供 MLX 版本,Cohere 内部正在考虑量化及 llama.cpp 支持。
Key points
Open-weight release of North Mini Code model on Hugging Face (FP8 available).
在 Hugging Face 上开放 North Mini Code 模型权重(提供 FP8 格式)。
Free trial access via OpenCode and detailed deployment instructions for vLLM.
通过 OpenCode 免费试用,并提供了详细的 vLLM 部署指南。
Deployment requires vLLM main branch and cohere_melody>=0.9.0, with tool/reasoning parsers and 320k max model length.
部署需使用 vLLM 主分支和 cohere_melody>=0.9.0,支持工具与推理解析,上下文长度可达 320k。
Community-built MLX support already exists; Cohere exploring quantization and llama.cpp compatibility.
社区已提供 MLX 支持;Cohere 正在探索量化及 llama.cpp 兼容性。