Model distillation

Technology Radar

Published : Apr 02, 2025

Apr 2025

Trial

Scaling laws have been a key driver of the AI boom — the principle that larger models, datasets and compute resources lead to more powerful AI systems. However, consumer hardware and edge devices often lack the capacity to support large-scale models, creating the need for model distillation.

Model distillation transfers knowledge from a larger, more powerful model (teacher) to a smaller, cost-efficient model (student). The process typically involves generating a sample dataset from the teacher model and fine-tuning the student to capture its statistical properties. Unlike pruning or quantization, which focus on compressing models by removing parameters, distillation aims to retain domain-specific knowledge, minimizing accuracy loss. It can also be combined with quantization for further optimization.

Originally proposed by Geoffrey Hinton et al., model distillation has gained widespread adoption. A notable example is the Qwen/Llama distilled version of DeepSeek R1, which preserves strong reasoning capabilities in smaller models. With its growing maturity, the technique is no longer confined to research labs; it’s now being applied to everything from industrial to personal projects. Providers such as OpenAI and Amazon Bedrock offer guides to help developers distill their own small language models (SLMs). We believe adopting model distillation can help organizations manage LLM deployment costs while unlocking the potential of on-device LLM inference.