On-device LLM inference

Technology Radar

Published : Oct 23, 2024

NOT ON THE CURRENT EDITION

This blip is not on the current edition of the Radar. If it was on one of the last few editions, it is likely that it is still relevant. If the blip is older, it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the Radar. Understand more

Oct 2024

Assess

Large language models (LLMs) can now run in web browsers and on edge devices like smartphones and laptops, enabling on-device AI applications. This allows for secure handling of sensitive data without cloud transfer, extremely low latency for tasks like edge computing and real-time image or video processing, reduced costs by performing computations locally and functionality even when internet connectivity is unreliable or unavailable. This is an active area of research and development. Previously, we highlighted MLX, an open-source framework for efficient machine learning on Apple silicon. Other emerging tools include Transformers.js and Chatty. Transformers.js lets you run transformers in the browser using ONNX Runtime, supporting models converted from PyTorch, TensorFlow and JAX. Chatty leverages WebGPU to run LLMs natively and privately in the browser, offering a feature-rich in-browser AI experience.