Maximizing Qualcomm NPU Performance with LiteRT

Modern smartphones utilize advanced system-on-chip (SoC) configurations that include CPUs, GPUs, and NPUs, enabling on-device AI experiences that surpass traditional server-based systems in speed and interactivity. While GPUs handle most AI workloads—present in about 90% of Android devices—they can become bottlenecks when multiple demanding AI tasks run simultaneously. For example, running a text-to-image model while processing live camera feeds can overwhelm the GPU, leading to lag and poor user experience.

Enter the Neural Processing Unit (NPU), a specialized processor designed for AI tasks. It delivers tens of TOPS (Tera Operations Per Second) of dedicated performance, far exceeding a mobile GPU’s capacity. Importantly, NPUs consume less power per operation compared to CPUs and GPUs, making them ideal for battery-powered devices. Currently, over 80% of recent Qualcomm SoCs include NPUs, which operate in parallel with GPUs and CPUs—freeing these resources to handle rendering and system instructions—thus enabling smoother, faster AI applications.

To leverage this power on mobile devices, Google has developed LiteRT Qualcomm AI Engine Direct (QNN) Accelerator, an enhancement over the previous TFLite QNN delegate. This new accelerator simplifies deployment for developers by abstracting away complex, vendor-specific SDKs and unifying the process across different SoCs. Developers can now deploy models seamlessly via either pre-compiled or on-device conversion, supporting models from sources like Qualcomm AI Hub.

The QNN Accelerator supports a broad set of LiteRT operations, ensuring maximum utilization of the NPU and full model delegation for top-tier performance. It includes optimized kernels tailored for large language models and complex GenAI tasks, achieving state-of-the-art results with models like Gemma and FastVLM.

Benchmark tests across 72 AI models spanning vision, audio, and natural language processing show remarkable improvements: NPU acceleration can provide up to 100 times faster processing over CPUs and 10 times faster than GPUs. By supporting 90 LiteRT ops, the new accelerator enables a significant leap forward in on-device AI performance—making real-time, high-quality AI experiences more accessible on mobile devices.

Conclusion:
The integration of Qualcomm’s NPU with LiteRT’s latest QNN accelerator unlocks unprecedented on-device AI speed and efficiency. This advancement paves the way for more sophisticated, responsive, and power-efficient AI applications directly on smartphones, enhancing user experiences across various industries.

FAQ:

Q: What is the main benefit of using an NPU in smartphones?
A: NPUs provide high-performance, energy-efficient AI processing, enabling faster, more responsive AI features without draining the battery.

Q: How does LiteRT improve deploying AI models on mobile?
A: LiteRT simplifies the deployment process by unifying workflows across different SoCs and abstracting low-level SDK complexities, making model integration easier.

Q: How much faster is NPU acceleration compared to CPU and GPU?
A: NPU acceleration can deliver up to 100 times speedup over CPUs and 10 times over GPUs, significantly enhancing on-device AI performance.

Q: Is the new accelerator compatible with existing models?
A: Yes, it supports a wide range of LiteRT operations, allowing existing models from sources like Qualcomm AI Hub to be optimized and deployed efficiently.

Q: Why is on-device AI important?
A: On-device AI reduces latency, increases privacy, and enables real-time interactions without relying on internet connectivity.