Apple Announces CoreAI On-Device Inference Engine at WWDC, Targeting Larger Models on Apple Silicon

Loading / 加载中

English summary

Apple revealed CoreAI at WWDC as a future replacement for CoreML, designed for optimized on-device inference on Apple Silicon devices including phones and tablets. The engine supports larger models than CoreML, with Apple demonstrating a 20-billion-parameter lazily loaded Mixture of Experts model deployable on device. Supported models are listed on GitHub, currently limited to mid-2025 releases, and require Python-based weight conversion similar to CoreML. CoreAI implies a major update to Apple Neural Engine operations, though no performance benchmarks have been released yet. It positions itself as an alternative to MLX, llama.cpp, and PyTorch for on-device deployment.

Chinese summary

苹果在WWDC上发布了CoreAI，作为CoreML的未来替代品，专为iPhone、iPad等苹果芯片设备优化端侧推理。CoreAI支持比CoreML更大的模型，苹果展示了可在设备上部署的200亿参数惰性加载混合专家模型。支持的模型列表已在GitHub公布，目前仅限2025年中期的模型，需通过Python脚本转换权重。此举暗示对Apple Neural Engine操作进行了重大更新，但尚未公布性能数据。CoreAI成为MLX、llama.cpp和PyTorch的端侧部署替代方案。

Key points

CoreAI is a new on-device inference engine announced by Apple at WWDC, replacing CoreML and targeting Apple Silicon.

CoreAI是苹果在WWDC发布的新型端侧推理引擎，将取代CoreML，专为苹果芯片优化。

It supports larger models, demonstrated by a 20B-parameter MoE model deployable on device, enabled by ANE updates.

得益于Apple Neural Engine的更新，它支持更大模型，展示了可设备端部署的200亿参数混合专家模型。

Model weights must be converted via a Python script, and currently supported models are limited to those from mid-2025.

模型权重需要通过Python脚本转换，目前仅支持2025年中期的模型。

Performance details are absent, and it likely trails pure MLX on GPU, but aims to enable larger models in apps.

性能细节缺失，可能在GPU上不及纯MLX，但旨在让应用加载更大的模型。