Accelerating Web AI on Arm

Key Concepts

ARM Architecture: The underlying processor design used in a vast majority of mobile devices, many laptops, cloud servers, and IoT devices.
Web AI: Artificial intelligence applications and models that run directly within a web browser.
On-Device Inference: Running AI models locally on a user's device rather than sending data to a cloud server.
Clyde AI: ARM's AI acceleration technology designed to optimize AI workloads on ARM CPUs.
SVE2 (Scalable Vector Extension 2): An ARM CPU feature that adds vector support for enhanced parallel processing.
SME2 (Scalable Matrix Extension 2): A more recent ARM CPU feature that adds multi-vector operations, significantly boosting AI performance.
Ethos MPUs (Microprocessor Units): ARM's specialized IP blocks for bringing ML capabilities to edge devices.
Neural Accelerator (NX): ARM's upcoming IP block for optimized AI and neural graphics workloads on GPUs, expected in 2026.
WebGPU: A modern web API that provides access to GPU capabilities for graphics and general-purpose computation, including ML.
WebAssembly (Wasm): A binary instruction format for a stack-based virtual machine, designed as a portable compilation target for high-level languages like C, C++, and Rust, enabling near-native performance in web browsers.
V8: The JavaScript engine used in Chrome and Node.js, which ARM actively contributes to for performance optimization.
LLVM: A compiler infrastructure project that serves as a de facto compiler for WebAssembly.
WebNN (Web Neural Network API): A proposed web API designed to provide a consistent interface for accessing hardware ML accelerators (CPU, GPU, NPU) from web applications.
ONNX Runtime Web: A JavaScript-based version of ONNX Runtime that allows ONNX models to run in web browsers.
Vulkan ML Extensions: Extensions to the Vulkan graphics API that introduce native support for tensors and graphs, enabling ML workloads on GPUs.
Benchmarks: Tools and methodologies used to measure and compare the performance of hardware and software.

ARM's Role in Enabling Web AI on ARM Architecture

Will Lord, a Product Manager at ARM, discusses ARM's mission to empower developers to build AI and application experiences across their pervasive compute platform, from mobile to cloud. He emphasizes ARM's commitment to making web AI on their architecture a reality.

ARM's Pervasive Compute Platform and AI Revolution

ARM is the creator of the most pervasive compute platform globally, with its architecture powering 99% of smartphones and a significant and growing portion of cloud infrastructure (50% of new hyperscaler compute capacity added this year was ARM-based). Over 325 billion ARM chips have been shipped, impacting every connected person. This widespread adoption is fueled by a robust software development community.

ARM's product portfolio includes:

CPUs: Enhanced with Neon matrix multiplication, SVE2 for vector support, and SME2 for multi-vector operations. SME2, combined with Clyde AI integrations, offers up to a five-times AI workload performance uplift compared to previous generations. Devices like iPhone 16+, M4-based Macs, and upcoming Android devices benefit from these advancements.
GPUs: The Marley and Immortalis product lines deliver exceptional graphics and gaming performance. The new Neural Accelerator (NX) technology, an IP block for optimized AI and neural graphics workloads, will be available in 2026.
Ethos MPUs: These bring ML capabilities to the edge, enabling AI integration in IoT devices and heterogeneous systems.

Developer Experience and Enablement for Web AI

ARM recognizes that hardware is a vehicle for software and invests heavily in developer experience. Web applications have always been crucial for ARM, whether running in the browser or natively. Notably, ARM added a JavaScript data type conversion instruction in 2016, highlighting the centrality of JavaScript to ARM's web success.

ARM's efforts in web AI focus on two key areas:

1. On-Device Inference

Concerns about cloud inference costs, privacy, and latency are driving a demand for on-device AI. ARM addresses this with:

Clyde AI and ARM Clyde Libraries: Optimizations for ARM CPUs that leverage features like SVE for maximum AI performance.
Framework Integration: These libraries are integrated into popular AI/ML frameworks and runtimes such as PyTorch, Exeutor, ONNX, Media Pipe, and Light RT.
Performance Gains: Demonstrations show 2x to 6x local model performance increases on ARM v9 devices for Stable Diffusion, speech recognition, chat, and audio generation. For example, FI3 on a Windows ARM PC shows a 2.4x to 4x faster prompt processing, and a Vivo X200 flagship sees a 2.6x uplift due to Clyde integration with ONNX Runtime.
Web AI Potential: ARM believes that connecting the browser to Clyde through on-device ML frameworks and web APIs can provide similar performance uplifts for web developers without requiring code changes.

2. Web APIs, Libraries, and Frameworks

GPU for the Web: ARM is actively involved in the W3C WebGPU working group and has made significant contributions to Dawn, Chrome's WebGPU implementation, focusing on native usage with Vulkan.
- ML Extensions for Vulkan: These extensions introduce native support for tensors and graphs, making ML a first-class citizen in Vulkan. A demonstration of a neural upscaler showed efficient GPU workload while rendering at 1080p from 540p.
- WebGPU Performance Improvements: ARM is collaborating with Google on WebGPU performance enhancements in the ML framework ML Drift and on tooling for debugging and optimizing WebGPU graphics in the browser.
WebAssembly (Wasm): ARM has a dedicated team working on V8 (Chrome and Node.js's Wasm engine) to achieve optimal performance across billions of ARM CPUs.
- V8 Contributions: Hundreds of patches have been made to V8, with a recent focus on memory optimizations. ARM has ensured that larger models and memory-intensive applications run optimally on ARM as the Wasm spec evolves (e.g., Memory 64).
- Wasm Benchmark: ARM is developing a public Wasm benchmark to help the industry identify performance bottlenecks.
- LLVM Contributions: ARM has made over 14,000 patches to LLVM, the de facto compiler for WebAssembly, impacting its overall performance.

Bridging Native and Web AI Experiences

ARM's success in the mobile market has been driven by investments in web technologies and software support, particularly for Chrome and Chromium. Their teams work on ensuring Chromium-based browsers are secure and performant on ARM hardware across various operating systems (Linux, Mac, Windows, iOS, Android), even for future hardware generations.

The challenge is to provide web app developers with the same direct access to hardware ML accelerators as native app developers. This involves connecting the browser to the right acceleration libraries and APIs.

Proposed Path for Web AI Acceleration on ARM:

ARM envisions a scenario where a JavaScript app using ONNX Runtime Web can leverage hardware acceleration without developers needing to manage device specifics. This is where WebNN plays a crucial role.

GPU Path via WebNN:
- A WebNN backend wired to ARM's Vulkan ML extensions and data graph pipeline.
- This pipeline would route to shader cores today and to neural accelerators on upcoming ARM GPUs without code changes.
- Ideal for CNNs, transformers, and camera/video use cases.
CPU Path via WebNN:
- Wiring WebNN to XNMPack and then to Clyde AI.
- Suitable for bursty workloads and as a fallback when GPU is unavailable, busy, or not performant.

Developers can still bind a GPU device to WebNN or use WebGPU directly for more granular control. Clyde AI support is already integrated into XNMPack, offering some acceleration via the WebNN backend in Chromium.

Challenges and Future Directions:

WebNN Support: WebNN is not yet widely supported across all browsers.
Platform Limitations: Current WebNN implementations offer a narrow path to web AI acceleration on ARM with limited device and platform support.

ARM believes benchmarks are key to driving continuous improvements and identifying the best combinations of web APIs, browser integrations, and hardware. They propose a benchmark suite, potentially built on Speedometer, that includes AI-specific tasks like image generation, object detection, and chatbot performance, in addition to traditional browsing metrics.

Call to Action and Collaboration

ARM actively seeks feedback from application developers. They encourage developers to share their challenges, whether related to memory management in WebAssembly, WebGPU debugging, or adding AI features to browser games. ARM has a dedicated team available to support integrations, migrations, and optimizations in partnership with developers.

Developers can find ARM representatives at demo stations, explore learning paths and developer.arm.com, and sign up for the ARM Developer Program for news and direct support.

ARM is committed to working with the community to build the future of web AI on ARM.