{"id":377847,"date":"2026-01-19T02:51:14","date_gmt":"2026-01-19T02:51:14","guid":{"rendered":"https:\/\/www.newsbeep.com\/uk\/377847\/"},"modified":"2026-01-19T02:51:14","modified_gmt":"2026-01-19T02:51:14","slug":"its-been-8-years-of-phone-ai-chips-and-theyre-still-wasting-their-potential","status":"publish","type":"post","link":"https:\/\/www.newsbeep.com\/uk\/377847\/","title":{"rendered":"It\u2019s been 8 years of phone AI chips \u2014 and they\u2019re still wasting their potential"},"content":{"rendered":"<p>It\u2019s been a little over eight years since we first started talking about Neural Processing Units (NPUs) inside our smartphones and the early prospects of on-device AI. Big points if you remember that the HUAWEI Mate 10\u2019s Kirin 970 processor was the first, though similar ideas had been floating around, particularly in imaging, before then.<\/p>\n<p>Of course, a lot has changed in the last eight years \u2014 Apple has finally embraced AI, albeit with mixed results, and Google has obviously leaned heavily into its <a href=\"https:\/\/www.androidauthority.com\/google-tensor-g5-3583168\/\" rel=\"nofollow noopener\" target=\"_blank\">Tensor Processor Unit<\/a> for everything from imaging to on-device language translation. Ask any of the big tech companies, from <a href=\"https:\/\/www.androidauthority.com\/arm-local-ai-interview-3627374\/\" rel=\"nofollow noopener\" target=\"_blank\">Arm<\/a> and Qualcomm to Apple and Samsung, and they\u2019ll all tell you that AI is the future of smartphone hardware and software.<\/p>\n<p>And yet the landscape for mobile AI still feels quite confined; we\u2019re restricted to a small but growing pool of on-device AI features, curated mostly by Google, with very little in the way of a creative developer landscape, and NPUs are partly to blame \u2014 not because they\u2019re ineffective, but because they\u2019ve never been exposed as a real platform. Which begs the question, what exactly is this silicon sitting in our phones really good for?<\/p>\n<p>What is an NPU anyway?<\/p>\n<p><img class=\"e_jg\" decoding=\"async\" loading=\"lazy\"  title=\"SoC resting on Google Pixel phone\"  alt=\"SoC resting on Google Pixel phone\" src=\"https:\/\/www.newsbeep.com\/uk\/wp-content\/uploads\/2026\/01\/SoC-resting-on-Google-Pixel-phone.jpg\"\/><\/p>\n<p>Robert Triggs \/ Android Authority<\/p>\n<p>Before we can decisively answer whether phones really \u201cneed\u201d an NPU, we should probably acquaint ourselves with what it actually does.<\/p>\n<p>Just like your phone\u2019s general-purpose CPU for running apps, GPU for rendering games, or its ISP dedicated to crunching image and video data, an NPU is a purpose-built processor for running AI workloads as quickly and efficiently as possible. Simple enough.<\/p>\n<p>Specifically, an NPU is designed to handle smaller data sizes (such as tiny 4-bit and even 2-bit models), specific memory patterns, and highly parallel mathematical operations, such as fused multiply-add and fused multiply\u2013accumulate.<\/p>\n<p>Mobile NPUs have taken hold to run AI workloads that traditional processors struggle with.<\/p>\n<p>Now, as I said back in 2017, you don\u2019t strictly need an NPU to run machine learning workloads; lots of smaller algorithms can run on even a modest CPU, while the data centers powering various Large Language Models run on hardware that\u2019s closer to an NVIDIA graphics card than the NPU in your phone.<\/p>\n<p>However, a dedicated NPU can help you run models that your CPU or GPU can\u2019t handle at pace, and it can often perform tasks more efficiently. What this heterogeneous approach to computing can cost in terms of complexity and silicon area, it can gain back in power and performance, which are obviously key for smartphones. No one wants their phone\u2019s AI tools to eat up their battery.<\/p>\n<p>Wait, but doesn\u2019t AI also run on graphics cards?<\/p>\n<p><img class=\"e_jg\" decoding=\"async\" loading=\"lazy\"  title=\"NVIDIA GeForce 4090 in box\"  alt=\"NVIDIA GeForce 4090 in box\" src=\"https:\/\/www.newsbeep.com\/uk\/wp-content\/uploads\/2026\/01\/NVIDIA-GeForce-4090-in-box-scaled.jpeg\"\/><\/p>\n<p>Oliver Cragg \/ Android Authority<\/p>\n<p>If you\u2019ve been following the <a href=\"https:\/\/www.androidauthority.com\/ram-shortage-prices-explained-3626017\/\" rel=\"nofollow noopener\" target=\"_blank\">ongoing RAM price crisis<\/a>, you\u2019ll know that AI data centers and the demand for powerful AI and GPU accelerators, particularly those from NVIDIA, are driving the shortages.<\/p>\n<p>What makes NVIDIA\u2019s CUDA architecture so effective for AI workloads (as well as graphics) is that it\u2019s massively parallelized, with tensor cores that handle highly fused multiply\u2013accumulate (MMA) operations across a wide range of matrix and data formats, including the tiny bit-depths used for modern quantized models.<\/p>\n<p>While modern mobile GPUs, like Arm\u2019s Mali and Qualcomm\u2019s Adreno lineup, can support 16-bit and increasingly 8-bit data types with highly parallel math, they don\u2019t execute very small, heavily quantized models \u2014 such as INT4 or lower \u2014 with anywhere near the same efficiency. Likewise, despite supporting these formats on paper and offering substantial parallelism, they aren\u2019t optimized for AI as a primary workload.<\/p>\n<p>Mobile GPUs focus on efficiency; they&#8217;re far less powerful for AI than desktop rivals.<\/p>\n<p>Unlike beefy desktop graphics chips, mobile GPU architectures are designed first and foremost for power efficiency, using concepts such as tile-based rendering pipelines and sliced execution units that aren\u2019t entirely conducive to sustained, compute-intensive workloads. Mobile GPUs can definitely perform AI compute and are quite good in some situations, but for highly specialized operations, there are often more power-efficient options.<\/p>\n<p>Software development is the other equally important half of the equation. NVIDIA\u2019s CUDA exposes key architectural attributes to developers, allowing for deep, kernel-level optimizations when running AI workloads. Mobile platforms lack comparable low-level access for developers and device manufacturers, instead relying on higher-level and often vendor-specific abstractions such as Qualcomm\u2019s Neural Processing SDK or Arm\u2019s Compute Library.<\/p>\n<p>This highlights a significant pain point for the mobile AI development environment. While desktop development has mostly settled on CUDA (though AMD\u2019s ROCm is gaining traction), smartphones run a variety of NPU architectures. There\u2019s Google\u2019s proprietary Tensor, Snapdragon Hexagon, Apple\u2019s Neural Engine, and more, each with its own capabilities and development platforms.<\/p>\n<p>NPUs haven\u2019t solved the platform problem<\/p>\n<p><img class=\"e_jg\" decoding=\"async\" loading=\"lazy\"  title=\"gemini image generation disney openai\"  alt=\"gemini image generation disney openai\" src=\"https:\/\/www.newsbeep.com\/uk\/wp-content\/uploads\/2026\/01\/gemini-image-generation-disney-openai.jpg\"\/><\/p>\n<p>Taylor Kerns \/ Android Authority<\/p>\n<p>Smartphone chipsets that boast NPU capabilities (which is essentially all of them) are built to solve one problem \u2014 supporting smaller data values, complex math, and challenging memory patterns in an efficient manner without having to retool GPU architectures. However, discrete NPUs introduce new challenges, especially when it comes to third-party development.<\/p>\n<p>While APIs and SDKs are available for Apple, Snapdragon, and MediaTek chips, developers traditionally had to build and optimize their applications separately for each platform. Even Google doesn\u2019t yet provide easy, general developer access for its AI showcase Pixels: the Tensor ML SDK remains in experimental access, with no guarantee of general release. Developers can experiment with higher-level <a href=\"https:\/\/www.androidauthority.com\/gemini-nano-features-devices-3490062\/\" rel=\"nofollow noopener\" target=\"_blank\">Gemini Nano<\/a> features via Google\u2019s ML Kit, but that stops well short of true, low-level access to the underlying hardware.<\/p>\n<p>Worse, Samsung withdrew support for its Neural SDK altogether, and Google\u2019s more universal Android NNAPI has since been deprecated. The result is a labyrinth of specifications and abandoned APIs that make efficient third-party mobile AI development exceedingly difficult. Vendor-specific optimizations were never going to scale, leaving us stuck with cloud-based and in-house compact models controlled by a few major vendors, such as Google.<\/p>\n<p>LiteRT runs on-device AI on Android, iOS, Web, IoT, and PC environments.<\/p>\n<p>Thankfully, Google introduced <a href=\"https:\/\/ai.google.dev\/edge\/litert\/overview\" target=\"_blank\" rel=\"nofollow noopener\">LiteRT<\/a> in 2024 \u2014 effectively repositioning TensorFlow Lite \u2014 as a single on-device runtime that supports CPU, GPU, and vendor NPUs (currently Qualcomm and MediaTek). It was specifically designed to maximize hardware acceleration at runtime, leaving the software to choose the most suitable method, addressing NNAPI\u2019s biggest flaw. While NNAPI was intended to abstract away vendor-specific hardware, it ultimately standardized the interface rather than the behavior, leaving performance and reliability to vendor drivers \u2014 a gap LiteRT attempts to close by owning the runtime itself.<\/p>\n<p>Interestingly, LiteRT is designed to run inference entirely on-device across Android, iOS, embedded systems, and even desktop-class environments, signaling Google\u2019s ambition to make it a truly cross-platform runtime for compact models. Still, unlike desktop AI frameworks or diffusion pipelines that expose dozens of runtime tuning parameters, a TensorFlow Lite model represents a fully specified model, with precision, quantization, and execution constraints decided ahead of time so it can run predictably on constrained mobile hardware.<\/p>\n<p><img class=\"e_jg\" decoding=\"async\" loading=\"lazy\"  title=\"LiteRT Hardware Accelerator Table\"  alt=\"LiteRT Hardware Accelerator Table\" src=\"https:\/\/www.newsbeep.com\/uk\/wp-content\/uploads\/2026\/01\/LiteRT-Hardware-Accelerator-Table.png\"\/><\/p>\n<p>While abstracting away the vendor-NPU problem is a major perk of LiteRT, it\u2019s still worth considering whether NPUs will remain as central as they once were in light of other modern developments.<\/p>\n<p>For instance, Arm\u2019s new SME2 external extension for its latest <a href=\"https:\/\/www.androidauthority.com\/arm-c1-cpu-mali-g1-gpu-deep-dive-3595933\/\" rel=\"nofollow noopener\" target=\"_blank\">C1 series of CPUs<\/a> provides up to 4x CPU-side AI acceleration for some workloads, with wide framework support and no need for dedicated SDKs. It\u2019s also possible that mobile GPU architectures will shift to better support advanced machine learning workloads, possibly reducing the need for dedicated NPUs altogether. <a href=\"https:\/\/www.androidauthority.com\/samsung-custom-gpu-galaxy-s28-3628005\/\" rel=\"nofollow noopener\" target=\"_blank\">Samsung is reportedly exploring its own GPU architecture<\/a> specifically to better leverage on-device AI, which could debut as early as the Galaxy S28 series. Likewise, Immagination\u2019s E-series is specifically built for AI acceleration, debuting support for FP8 and INT8. Maybe Pixel will adopt this chip, eventually.<\/p>\n<p>LiteRT complements these advancements, freeing developers to worry less about exactly how the hardware market shakes out. The advance of complex instruction support on CPUs can make them increasingly efficient tools for running machine learning workloads rather than a fallback. Meanwhile, GPUs with superior quantization support might eventually move to become the default accelerators instead of NPUs, and LiteRT can handle the transition. That makes LiteRT feel closer to the mobile-side equivalent of CUDA we\u2019ve been missing \u2014 not because it exposes hardware, but because it finally abstracts it properly.<\/p>\n<p>Dedicated mobile NPUs are unlikely to disappear but apps may finally start leveraging them.<\/p>\n<p>Dedicated mobile NPUs are unlikely to disappear any time soon, but the NPU-centric, vendor-locked approach that defined the first wave of on-device AI clearly isn\u2019t the endgame. For most third-party applications, CPUs and GPUs will continue to shoulder much of the practical workload, particularly as they gain more efficient support for modern machine learning operations. What matters more than any single block of silicon is the software layer that decides how \u2014 and if \u2014 that hardware is used.<\/p>\n<p>If LiteRT succeeds, NPUs become accelerators rather than gatekeepers, and on-device mobile AI finally becomes something developers can target without betting on a specific chip vendor\u2019s roadmap. With that in mind, there\u2019s probably still some way to go before on-device AI has a vibrant ecosystem of third-party features to enjoy, but we are finally inching a little bit closer.<\/p>\n<p> Don\u2019t want to miss the best from Android Authority?<\/p>\n<p><a href=\"https:\/\/andauth.co\/AAGooglePreferredSource\" class=\"e_rm\" target=\"_blank\" rel=\"noreferrer nofollow noopener\"><img class=\"e_jg\" decoding=\"async\" loading=\"lazy\"  title=\"google preferred source badge light@2x\"  alt=\"google preferred source badge light@2x\" src=\"https:\/\/www.newsbeep.com\/uk\/wp-content\/uploads\/2026\/01\/1768240152_738_google_preferred_source_badge_light@2x.png\"\/><img class=\"e_jg\" decoding=\"async\" loading=\"lazy\"  title=\"google preferred source badge dark@2x\"  alt=\"google preferred source badge dark@2x\" src=\"https:\/\/www.newsbeep.com\/uk\/wp-content\/uploads\/2026\/01\/1768240152_238_google_preferred_source_badge_dark@2x.png\"\/><\/a><\/p>\n<p>Thank you for being part of our community. Read our\u00a0<a class=\"c-link\" href=\"https:\/\/www.androidauthority.com\/android-authority-comment-policy\/\" target=\"_blank\" rel=\"noopener noreferrer nofollow\" data-stringify-link=\"https:\/\/www.androidauthority.com\/android-authority-comment-policy\/\" data-sk=\"tooltip_parent\">Comment Policy<\/a> before posting.<\/p>\n","protected":false},"excerpt":{"rendered":"It\u2019s been a little over eight years since we first started talking about Neural Processing Units (NPUs) inside&hellip;\n","protected":false},"author":2,"featured_media":377848,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16],"tags":[554,2306,86,56,54,55],"class_list":{"0":"post-377847","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-mobile","8":"tag-ai","9":"tag-mobile","10":"tag-technology","11":"tag-uk","12":"tag-united-kingdom","13":"tag-unitedkingdom"},"_links":{"self":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts\/377847","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/comments?post=377847"}],"version-history":[{"count":0,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/posts\/377847\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/media\/377848"}],"wp:attachment":[{"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/media?parent=377847"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/categories?post=377847"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.newsbeep.com\/uk\/wp-json\/wp\/v2\/tags?post=377847"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}