Apple Announces CoreAI On-Device Inference Engine at WWDC, Targeting Larger Models on Apple Silicon
English summary
Apple revealed CoreAI at WWDC as a future replacement for CoreML, designed for optimized on-device inference on Apple Silicon devices including phones and tablets. The engine supports larger models than CoreML, with Apple demonstrating a 20-billion-parameter lazily loaded Mixture of Experts model deployable on device. Supported models are listed on GitHub, currently limited to mid-2025 releases, and require Python-based weight conversion similar to CoreML. CoreAI implies a major update to Apple Neural Engine operations, though no performance benchmarks have been released yet. It positions itself as an alternative to MLX, llama.cpp, and PyTorch for on-device deployment.
Chinese summary
苹果在WWDC上发布了CoreAI,作为CoreML的未来替代品,专为iPhone、iPad等苹果芯片设备优化端侧推理。CoreAI支持比CoreML更大的模型,苹果展示了可在设备上部署的200亿参数惰性加载混合专家模型。支持的模型列表已在GitHub公布,目前仅限2025年中期的模型,需通过Python脚本转换权重。此举暗示对Apple Neural Engine操作进行了重大更新,但尚未公布性能数据。CoreAI成为MLX、llama.cpp和PyTorch的端侧部署替代方案。
Key points
CoreAI is a new on-device inference engine announced by Apple at WWDC, replacing CoreML and targeting Apple Silicon.
CoreAI是苹果在WWDC发布的新型端侧推理引擎,将取代CoreML,专为苹果芯片优化。
It supports larger models, demonstrated by a 20B-parameter MoE model deployable on device, enabled by ANE updates.
得益于Apple Neural Engine的更新,它支持更大模型,展示了可设备端部署的200亿参数混合专家模型。
Model weights must be converted via a Python script, and currently supported models are limited to those from mid-2025.
模型权重需要通过Python脚本转换,目前仅支持2025年中期的模型。
Performance details are absent, and it likely trails pure MLX on GPU, but aims to enable larger models in apps.
性能细节缺失,可能在GPU上不及纯MLX,但旨在让应用加载更大的模型。