AFM 3 Core Advanced: how Apple fit 20 billion parameters into iPhone

At WWDC 2026, Apple unveiled AFM 3 Core Advanced — the flagship third-generation multimodal model. The neural network runs entirely on-device on iPhone and iPad, with no personal data sent to external servers. Here's how they pulled it off.

How it works

**Sparse architecture.** Instead of running all 20 billion parameters at once, the model [activates](https://machinelearning.apple.com/research/introducing-third-generation-of-apple-foundation-models) only 1–4 billion neurons per request.
**Flash storage.** Normally, heavy models need all their weights loaded into fast RAM. Apple keeps the model in slow NAND flash instead, so it doesn't hog the device's memory — which is always tight on phones.
**Novel routing.** Standard neural networks switch their compute blocks on every token. Mobile memory bandwidth can't keep up with that, so AFM 3 makes its routing decision once per entire user request.
**Smart partitioning.** The system has a core set of always-active "shared experts" plus specialized nodes that only get loaded into RAM when actually needed for the task.

Where it's used

The model is deeply integrated into iOS 27 and other new Apple operating systems — powering an upgraded Siri, image generation, and advanced voice recognition. MacStories are calling it a historic breakthrough: Apple's engineers implemented Instruction-Following Pruning, an algorithm that elegantly routes around the mobile memory bottleneck.

Limitations

The architecture is tightly optimized, but only hits its full potential on the latest Apple Silicon chips. Older devices will still be offloading heavy tasks to the cloud.