The new silicon: a closer look at on-device AI chips
NPUs are landing in everything from phones to laptops. What they can do, and why it matters for privacy.
There’s a chip in your phone that did almost nothing five years ago and now runs a neural network every time you unlock it, dictate a message, or search your photos for “dog. ” It’s the NPU — the neural processing unit — and it’s quietly becoming the most interesting piece of silicon in consumer hardware.
NPUs are now standard in flagship phones, shipping in laptops under the “AI PC” banner, and increasingly the reason a device can do something genuinely useful without ever talking to a server. Here’s what they actually are, why they arrived now, and why the privacy story is the part worth paying attention to.
What an NPU actually is
A CPU is a generalist. A GPU is a parallel workhorse built for graphics that happens to be good at AI. An NPU is neither — it’s purpose-built hardware that does one narrow thing with extreme efficiency: the dense matrix multiplications at the heart of a neural network.
The trick is specialization. NPUs run at low numerical precision — eight-bit integers, sometimes four — because neural networks tolerate it, and lower precision means dramatically less energy per operation. That’s why a phone can run a model continuously without melting or draining the battery, something a CPU doing the same math could never sustain.
Vendors quote performance in TOPS — trillions of operations per second. The number is marketing-flavored, but the trend underneath it is real: the NPUs in 2025-era laptops cleared 40 TOPS, the bar Microsoft set for its Copilot+ tier, up from a small fraction of that just a couple of years earlier.
Why now
The hardware is only half the story. The other half is that models got small enough to meet it. Quantization, distillation, and better architectures mean a model that would have demanded a data center in 2021 now fits in a few gigabytes and runs locally at acceptable quality.
- Apple’s Neural Engine has shipped in every iPhone and Mac for years and now drives on-device language and image features.
- Qualcomm’s Hexagon NPU powers most Android flagships and the Snapdragon-based AI PCs.
- Google’s Tensor line leans on its NPU for on-device speech and computational photography.
- Intel, AMD, and Microsoft pushed the “AI PC” spec that made a 40+ TOPS NPU table stakes for Windows laptops.
The privacy dividend
Here’s the part that matters beyond benchmarks. When inference happens on the device, your data never leaves it. The photo you search, the message you dictate, the document you summarize — none of it has to travel to a server, get logged, or sit in someone else’s training pipeline.
That’s a genuine structural shift. For a decade, “smart feature” and “send your data to the cloud” were effectively the same sentence. On-device AI breaks the coupling: you can have the feature and keep the data. For sensitive categories — health, location, private messages — that’s not a nicety, it’s the difference between a feature you’ll actually turn on and one you won’t.
On-device inference quietly decouples two things we’d come to assume were inseparable: intelligence and surveillance.
The catch
It isn’t magic. On-device models are smaller, and therefore less capable, than the frontier models in the cloud — you’re trading raw quality for privacy and latency. And memory bandwidth, not compute, is often the real ceiling: a model has to fit in RAM and stream its weights fast enough to feel instant.
- Capability gap — local models trail the biggest cloud models today, and will keep trailing.
- Memory pressure — a 7-billion-parameter model wants several gigabytes you weren’t spending before.
- Thermal limits — sustained inference on a thin phone is a heat problem, not just a math one.
- Fragmented software — Core ML, NNAPI, DirectML, and ONNX Runtime all target NPUs differently, so “runs on the NPU” is rarely write-once.
What to expect
The likely future isn’t local-or-cloud; it’s both. Expect hybrid systems that run the quick, private, common cases on the NPU and quietly escalate the hard ones to a server — ideally telling you which is which. The interesting design question stops being “can it run on-device” and becomes “what deserves to. ”
The silicon is already in your pocket and on your desk. What’s still being figured out is the software and the defaults — and, more than anything, whether companies actually use the privacy headroom the hardware just handed them.
Enjoyed this?
Get the next deep dive in your inbox. No spam — just the stories worth reading.
Subscribe to the newsletter