The world of Large Language Models (LLMs) is constantly evolving, and DeepSeek is making waves with its first-generation reasoning models. Designed for performance comparable to OpenAI's cutting-edge models, DeepSeek-R1 and its distilled versions offer intriguing possibilities for developers and researchers alike. This article will explore DeepSeek-R1, its architecture, its distilled versions, and how to use them with Ollama.
DeepSeek-R1 represents DeepSeek's initial foray into sophisticated reasoning models. It aims to rival the capabilities of models like OpenAI's offerings, particularly in areas like:
DeepSeek-R1 stands out due to its design as a foundation model from which smaller, more efficient models can be derived through a process called distillation.
Model distillation is a crucial concept in understanding DeepSeek's approach. It involves training smaller models to mimic the behavior of a larger, more powerful model. This offers several advantages:
DeepSeek's approach demonstrates that knowledge and reasoning patterns can be effectively transferred from larger models to smaller ones, achieving better performance than training small models from scratch using reinforcement learning.
DeepSeek offers a range of distilled models based on the DeepSeek-R1 architecture, utilizing popular base models like Llama and Qwen. Here’s a breakdown of available options:
ollama run deepseek-r1:1.5b
.ollama run deepseek-r1:7b
.ollama run deepseek-r1:8b
.ollama run deepseek-r1:14b
.ollama run deepseek-r1:32b
.ollama run deepseek-r1:70b
.ollama run deepseek-r1:671b
.Ollama makes it incredibly easy to run DeepSeek-R1 and its distilled models. Ollama packages models into a self-contained format, including all dependencies, making deployment straightforward. To get started:
ollama run
command, specifying the model you want to use (e.g., ollama run deepseek-r1:7b
). Ollama downloads the model if it's not already present and prepares it for use.A significant advantage of the DeepSeek-R1 series is its permissive MIT License. This license allows for:
Important Note: The Qwen distilled models are derived from Qwen-2.5 series, which are originally licensed under Apache 2.0 License, and now finetuned with 800k samples curated with DeepSeek-R1. The Llama 8B distilled model is derived from Llama3.1-8B-Base and is originally licensed under llama3.1 license. The Llama 70B distilled model is derived from Llama3.3-70B-Instruct and is originally licensed under llama3.3 license.
DeepSeek-R1 and its distilled models represent a significant advancement in the field of open-source reasoning models. Its performance, combined with the ease of use provided by Ollama and the permissive MIT license, makes it an attractive option for developers, researchers, and businesses looking to leverage the power of LLMs. Whether you need a compact model for edge deployment or a powerful model for complex tasks, the DeepSeek-R1 family has something to offer. Stay tuned to DeepSeek's ongoing developments and explore the possibilities these models unlock! Always ensure to read the DeepSeek's Terms of Use before usage.