Running Distilled DeepSeek R1 models locally on Copilot+ PCs, powered by Windows Copilot Runtime

Unleashing AI Power: Running DeepSeek R1 Models Locally on Copilot+ PCs

The landscape of Artificial Intelligence is rapidly evolving, with a significant shift towards edge computing. Microsoft is at the forefront of this transformation, enabling developers to run powerful AI models directly on Copilot+ PCs. This article delves into the exciting possibilities of running distilled DeepSeek R1 models locally, powered by the Windows Copilot Runtime, and how this innovative approach is revolutionizing AI development.

The Dawn of On-Device AI with Copilot+ PCs

AI is no longer confined to the cloud. Copilot+ PCs, equipped with powerful Neural Processing Units (NPUs), are ushering in a new era of on-device AI processing. This means faster, more efficient AI performance, reduced latency, and enhanced privacy for users.

Key Benefits of On-Device AI:

Speed: Local processing eliminates the need to send data to the cloud, resulting in faster response times.
Efficiency: NPUs are designed for AI tasks, consuming less power than CPUs or GPUs.
Privacy: Data is processed locally, enhancing user privacy and security.
Continuous Services: NPUs enable AI-powered services to run semi-continuously.

DeepSeek R1: Optimized for Copilot+ PCs

Microsoft is collaborating with DeepSeek to bring optimized versions of the DeepSeek R1 models directly to Copilot+ PCs. Starting with the Qualcomm Snapdragon X, and soon expanding to Intel Core Ultra 200V and other platforms. The initial release features the DeepSeek-R1-Distill-Qwen-1.5B model, with the 7B and 14B variants to follow. These models are specifically optimized to leverage the capabilities of the NPU, enabling developers to build and deploy AI-powered applications that run efficiently on-device.

Getting Started Is Easy: AI Toolkit for VS Code

Experimenting with DeepSeek R1 models on your Copilot+ PC is straightforward. Simply download the AI Toolkit VS Code extension.

Steps to get started:

Download the AI Toolkit extension: Find the AI Toolkit in the VS Code marketplace and install it.
Download the DeepSeek model: The DeepSeek R1 model, optimized in the ONNX QDQ format, is available in the AI Toolkit's model catalog, sourced directly from Azure AI Foundry . Download it locally by clicking the “Download” button.
Experiment in the Playground: Open the Playground, load the “deepseek_r1_1_5” model, and start sending prompts.

The AI Toolkit provides a seamless developer workflow, allowing you to test and prepare models for deployment. You can also try the cloud-hosted source model in Azure Foundry by clicking on the “Try in Playground” button under “DeepSeek R1”.

Under the Hood: Silicon Optimizations for Peak Performance

The distilled Qwen 1.5B model comprises several components, including a tokenizer, embedding layer, context processing model, token iteration model, a language model head, and a de-tokenizer.

Optimization Techniques Overview:

Quantization: The models use 4-bit block-wise quantization for the embeddings and language model head, running these memory-intensive operations on the CPU.
NPU Focus: The bulk of NPU optimization efforts are directed towards the compute-heavy transformer block, which includes context processing and token iteration. This utilizes int4 per-channel quantization for weights alongside int16 activations.
ONNX QDQ Format: This format enables scaling across a diverse range of NPUs within the Windows ecosystem. ONNX (Open Neural Network Exchange) is an open standard for representing machine learning models, fostering interoperability.

Key Optimizations: Sliding Windows and QuaRot Quantization

To achieve low memory footprint and fast inference, several key optimizations were implemented:

Sliding Window Design: This design facilitates fast time to first token and supports long context, even without dynamic tensor support in the hardware.
QuaRot Quantization: QuaRot employs Hadamard rotations to remove outliers in weights and activations, enhancing quantization accuracy, particularly for low granularity settings.

These optimizations enable the DeepSeek R1 models to deliver performance comparable to larger models, all while maintaining a compact memory footprint.

Real-World Impact: Enhanced User Experiences

The NPU-optimized version of the DeepSeek R1 models delivers impressive performance, enabling users to interact with groundbreaking AI models entirely locally. This opens the door to new, innovative PC experiences, empowering developers to create applications that were previously impossible.

Performance Metrics (Feb 3, 2025 Update):

Time to First Token: Less than 70ms for short prompts (<64 tokens)
Throughput: ~40 tokens/s

The Future of AI on Windows

Microsoft's commitment to bringing AI to the edge is transforming the way we interact with technology. By enabling developers to run powerful models like DeepSeek R1 locally on Copilot+ PCs, powered by the Windows Copilot Runtime and ONNX Runtime, Microsoft is empowering a new generation of AI-powered applications and experiences. This groundbreaking capability promises a future where AI is more accessible, efficient, and integrated into our daily lives.

. . .

Movavi Video Converter 2020 Review - YouTube

Feb 19, 2020 ... Try Movavi Video Converter for 7 days For Windows or Mac OS X: https://bit.ly/2S2TECd 20% off coupon: CONVERT2020 Movavi Video Editor Plus: ...

Suno

Suno is building a future where anyone can make great music.

Best free site for prompt generating ai videos? : r/ArtificialInteligence

Feb 16, 2024 ... Kapwing: (https://www.kapwing.com/ai-video-generator) A versatile online video editor that includes a basic AI video generator. Free plans have ...

Europe's AI hopes rebound after DeepSeek success – POLITICO

Jan 29, 2025 ... Europe's AI hopes rebound after DeepSeek success ... New Chinese model suggests winning the AI race could be done on the cheap. ... BRUSSELS — In ...

AI Art Generator - AI Image Generator API

AI image generator. Free. No login. Millions of users. Use AI to bring imagination to life, producing stunning art, illustrations, and photos.