DeepSeek R1: A Comprehensive Guide to Local Deployment and Usage

DeepSeek R1: A Comprehensive Guide to Local Deployment and Usage

DeepSeek R1 has recently gained immense popularity, sparking numerous discussions and tutorials on local deployment. This article provides a simplified guide to deploying and using DeepSeek R1 locally, treating it as a digital companion to experiment with. For advanced usage, consider referring to the official research paper and deploying the full version.

What is DeepSeek R1?

Released on January 20, 2025, DeepSeek R1 is DeepSeek AI's first-generation inference model. It excels in complex reasoning tasks and is positioned as a competitor to OpenAI's o1 model. DeepSeek R1 is adept at handling intricate tasks such as mathematical reasoning, code generation, and logical inference.

Key Resources:

Research Paper: DeepSeek R1 on GitHub
Official Announcement: DeepSeek R1 Release News

DeepSeek R1 comes in multiple versions, including a full version (671B parameters) and distilled versions (1.5B to 70B parameters). While the full version offers superior performance, it demands substantial hardware resources. The distilled versions are more accessible for local deployment on standard hardware.

Understanding the Different Versions

Full Version (671B Parameters):

Requirement: Requires a minimum of 350GB of GPU memory (VRAM)/RAM.
Ideal For: Professional server deployments.

Distilled Versions:

Based On: Fine-tuned from open-source models like QWEN and LLAMA.
Parameter Range: 1.5B to 70B.
Ideal For: Local deployment on consumer-grade hardware.

Distilled vs. Full Version: Key Differences

Feature	Distilled Version	Full Version
Parameter Count	Fewer parameters (1.5B, 7B), near full performance	More parameters (32B, 70B), top performance
Hardware	Lower VRAM/RAM requirements	Higher VRAM/RAM requirements
Use Cases	Lightweight tasks, resource-constrained devices	High-precision tasks, professional environments

Deep Dive into Distilled Model Variants

Model Version	Parameters	Characteristics
deepseek-r1:1.5b	1.5B	Lightweight, suitable for low-end hardware, fast but limited performance
deepseek-r1:7b	7B	Balanced performance, good for most tasks, moderate hardware requirements
deepseek-r1:8b	8B	Slightly better than 7B, suitable for higher precision tasks
deepseek-r1:14b	14B	High-performance, suitable for complex tasks (math, coding), higher hardware demands
deepseek-r1:32b	32B	Professional-grade, for research and high-precision tasks, requires advanced hardware
deepseek-r1:70b	70B	Top-tier performance, for large-scale computation and complex tasks

Quantized Versions: Further Optimization

Quantized models offer reduced memory footprints using techniques like 4-bit quantization, further optimizing them for resource-limited environments. Note that quantization can reduce the model accuracy.

Model Version	Parameters	Characteristics
deepseek-r1:1.5b-qwen-distill-q4_K_M	1.5B	Lightweight, fast, limited performance, 4-bit quantization
deepseek-r1:7b-qwen-distill-q4_K_M	7B	Balanced, good for most tasks, moderate hardware, 4-bit quantization
deepseek-r1:8b-llama-distill-q4_K_M	8B	Slightly better than 7B, suitable for higher precision tasks, 4-bit quantization
deepseek-r1:14b-qwen-distill-q4_K_M	14B	High-performance, complex tasks (math, coding), higher hardware, 4-bit quantization
deepseek-r1:32b-qwen-distill-q4_K_M	32B	Professional, research, high-precision tasks, advanced hardware, 4-bit quantization
deepseek-r1:70b-llama-distill-q4_K_M	70B	Top-tier, large-scale computation, complex tasks, requires professional hardware, 4-bit quantization

Distilled vs. Quantized: A Quick Comparison

Model Type	Characteristics
Distilled	Fine-tuned for reduced parameters and near-original performance on lower-end hardware
Quantized	Reduced memory usage via lower precision (e.g., 4-bit quantization)

Example: deepseek-r1:7b-qwen-distill-q4_K_M: 7B model, distilled and quantized, reducing VRAM from 5GB to 3GB.

For most local deployments, the distilled versions are sufficient.

Hardware Requirements

Choosing the right model hinges on your hardware. Here's a breakdown:

2.1 System Configuration

Windows:

Minimum: NVIDIA GTX 1650 4GB or AMD RX 5500 4GB, 16GB RAM, 50GB storage.
Recommended: NVIDIA RTX 3060 12GB or AMD RX 6700 10GB, 32GB RAM, 100GB NVMe SSD.
High-Performance: NVIDIA RTX 3090 24GB or AMD RX 7900 XTX 24GB, 64GB RAM, 200GB NVMe SSD.

Linux:

Minimum: NVIDIA GTX 1660 6GB or AMD RX 5500 4GB, 16GB RAM, 50GB storage.
Recommended: NVIDIA RTX 3060 12GB or AMD RX 6700 10GB, 32GB RAM, 100GB NVMe SSD.
High-Performance: NVIDIA A100 40GB or AMD MI250X 128GB, 128GB RAM, 200GB NVMe SSD.

Mac:

Minimum: M2 MacBook Air (8GB RAM).
Recommended: M2/M3 MacBook Pro (16GB RAM).
High-Performance: M2 Max/Ultra Mac Studio (64GB RAM).

Model Selection Guide:

Model Name	Parameters	Size	VRAM (Approx.)	Recommended Mac Config	Recommended Windows/Linux Config
deepseek-r1:1.5b	1.5B	1.1 GB	~2 GB	M2/M3 MacBook Air (8GB+ RAM)	GTX 1650 4GB / RX 5500 4GB (16GB+ RAM)
deepseek-r1:7b	7B	4.7 GB	~5 GB	M2/M3/M4 MacBook Pro (16GB+ RAM)	RTX 3060 8GB / RX 6600 8GB (16GB+ RAM)
deepseek-r1:8b	8B	4.9 GB	~6 GB	M2/M3/M4 MacBook Pro (16GB+ RAM)	RTX 3060 Ti 8GB / RX 6700 10GB (16GB+ RAM)
deepseek-r1:14b	14B	9.0 GB	~10 GB	M2/M3/M4 Pro MacBook Pro (32GB+ RAM)	RTX 3080 10GB / RX 6800 16GB (32GB+ RAM)
deepseek-r1:32b	32B	20 GB	~22 GB	M2 Max/Ultra Mac Studio	RTX 3090 24GB / RX 7900 XTX 24GB (64GB+ RAM)
deepseek-r1:70b	70B	43 GB	~45 GB	M2 Ultra Mac Studio	A100 40GB / MI250X 128GB (128GB+ RAM)
deepseek-r1:1.5b-qwen-distill-q4_K_M	1.5B	1.1 GB	~2 GB	M2/M3 MacBook Air (8GB+ RAM)	GTX 1650 4GB / RX 5500 4GB (16GB+ RAM)
deepseek-r1:7b-qwen-distill-q4_K_M	7B	4.7 GB	~5 GB	M2/M3/M4 MacBook Pro (16GB+ RAM)	RTX 3060 8GB / RX 6600 8GB (16GB+ RAM)
deepseek-r1:8b-llama-distill-q4_K_M	8B	4.9 GB	~6 GB	M2/M3/M4 MacBook Pro (16GB+ RAM)	RTX 3060 Ti 8GB / RX 6700 10GB (16GB+ RAM)
deepseek-r1:14b-qwen-distill-q4_K_M	14B	9.0 GB	~10 GB	M2/M3/M4 Pro MacBook Pro (32GB+ RAM)	RTX 3080 10GB / RX 6800 16GB (32GB+ RAM)
deepseek-r1:32b-qwen-distill-q4_K_M	32B	20 GB	~22 GB	M2 Max/Ultra Mac Studio	RTX 3090 24GB / RX 7900 XTX 24GB (64GB+ RAM)
deepseek-r1:70b-llama-distill-q4_K_M	70B	43 GB	~45 GB	M2 Ultra Mac Studio	A100 40GB / MI250X 128GB (128GB+ RAM)

Local Installation Guide

This guide focuses on deploying DeepSeek R1 on a Mac environment:

Environment: M2/M3/M4 MacBook Pro (16GB RAM) Model: deepseek-r1:8b

Local Run Benefits:

Privacy: Data stays on your device.
Offline Use: No internet needed post-download.
Cost-Effective: No API costs.
Low Latency: Direct access without network delays.
Customization: Full control over model parameters.

While this tutorial will cover MacOS, stay tuned for future updates on Windows/Linux deployments.

3.1 Deployment Tools

Various tools simplify DeepSeek R1 deployment:

Ollama: Supports Windows, Linux, and Mac. Offers command-line and Docker deployment. Ideal for users who want to manage local models efficiently.
- Command: ollama run deepseek-r1:7b to download and run. Refer to the official Ollama documentation for detailed usage.
LM Studio: Supports Windows and Mac with a user-friendly GUI. Optimized for low-end hardware. Good for beginners.
Docker: Supports Linux and Windows for advanced users.
- Command: docker run -d --gpus=all -p 11434:11434 --name ollama ollama/ollama.

This guide uses Ollama due to its flexibility and suitability for users seeking data privacy and customization. If you have integrated graphics and want to experiment, consider LM Studio. Please see our article on [getting started with Docker] for more info.

3.2 Installing Ollama

Download: Get Ollama from the official website.
Install: Follow the installation prompts.
Verify: Open your terminal and run ollama --version

Installing the Model:

Copy the installation command from the Ollama website and run it in your terminal:

ollama run deepseek-r1:8b

This command downloads and initiates the DeepSeek R1 8B model.

Post-installation, monitor your hardware usage. A saturated GPU with moderate CPU and memory usage indicates optimal performance.

Essential Ollama Commands:

List Models: ollama list
Remove Model: ollama rm deepseek-r1:8b

Visualization Tools

Enhance your DeepSeek R1 experience with these visual interfaces:

Open-WebUI: A self-hosted LLM web interface for seamless interaction with local models, built for personal LLM use.
- GitHub: Open-WebUI on GitHub
- Docs: Open-WebUI Docs
Installation (Docker):
```
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
```
Access the interface at http://localhost:3000/.

Important: Disable the OpenAI API option in settings if offline to prevent errors.
Dify: A LLM application development platform that allows rapid creation of LLM applications (RAG, AI agents, etc.). Suitable for building AI SaaS, intelligent customer service, and RAG applications.
- GitHub: Dify on GitHub
After successful startup, access Dify through localhost.
- Add Ollama as a model provider, using http://host.docker.internal:11434 if Dify and Ollama are on the same machine and Dify is Dockerized.
Note: Dify offers more than just conversations; explore its features for advanced applications.

DeepSeek R1: Usage Insights

During local testing, the distilled version of DeepSeek R1 showed some shortcomings in code generation but demonstrated impressive text processing and reasoning capabilities.

For a better experience with the complete DeepSeek model, consider using DeepSeek's official API services, which are competitively priced. When DeepSeek first gained popularity, users occasionally encountered server overload issues. The Deepseek team hopes to resolve these issues in the near future.

DeepSeek's API can be integrated into VS Code using plugins like Continue, or into Open-WebUI.

Conclusion

DeepSeek R1 presents a compelling option for local AI experimentation. While both have unique qualities, DeepSeek shines in certain applications. By leveraging the deployment strategies and tools outlined in this guide, you can easily integrate DeepSeek R1 into your workflow.

. . .

Spin the Wheel (Picker Wheel) - Rafflys

The best Picker Wheel app. Spin the wheel to get random winners or choices. It's online and free ... Font Generator · Flip a Coin. Utilities. Legal Terms ...

Can I download Flyff Universer (performance issues) : r/FlyFF

Jun 4, 2022 ... I tried playing the SEA and the performance was ABYSMAL, like 9 fps, 21 if I cranked everything to the MINIMUM. Is there something I can do?

Enterprise Chrome vs Regular Chrome : r/sysadmin

Aug 5, 2014 ... Basically, Enterprise Chrome is just more heavily tested before release, whereas the regular download is just sort of tested and then thrown out ...

ColorSpace - Color Palettes Generator and Color Gradient Tool

Here you can find the perfect matching color scheme for your next project! Generate nice color palettes, color gradients and much more!

Favicon & App Icon Generator

Convert PNG to ICO, JPG to ICO, GIF to ICO. Create favicon.ico and iOS / Android App Icons. Edit a favicon to fit your needs, or search our gallery.