DeepSeek-V3 and DeepSeek-R1 are cutting-edge large language models (LLMs) developed by DeepSeek. DeepSeek-V3, with its staggering 671 billion parameters, utilizes a Mixture-of-Experts (MoE) architecture. DeepSeek-R1 is a high-performance inference model trained using DeepSeek-V3-Base. This article guides you through the one-click deployment of these models on Alibaba Cloud's Platform of Artificial Intelligence (PAI), leveraging the Model Gallery for streamlined deployment and acceleration options.
Deploying the full-scale DeepSeek-R1 or DeepSeek-V3 model (671B parameters) demands significant resources (8 GPUs with at least 96GB VRAM each), leading to higher costs. For users with limited resources or cost-sensitive applications, consider the distilled versions of the models.
According to tests, the DeepSeek-R1-Distill-Qwen-32B
model offers a good balance between performance and cost, making it suitable for cloud deployment as an alternative to the full DeepSeek-R1. Other distilled models (7B, 8B, 14B) are also available. The Model Gallery provides model evaluation tools to assess the actual performance of each model, helping you make an informed decision.
The following table outlines the minimum hardware configurations and supported token counts for each model, alongside the available deployment methods:
Model | Minimum Configuration | Max Token Count | BladeLLM | SGLang | vLLM | Standard |
---|---|---|---|---|---|---|
DeepSeek-R1 | 8 * GU120 (8 * 96 GB VRAM) | Not Supported | 16384 | 0 | 4096 | Not Supported |
DeepSeek-V3 | 8 * GU120 (8 * 96 GB VRAM) | Not Supported | 16384 | 0 | 4096 | 2000 |
DeepSeek-R1-Distill-Qwen-1.5B | 1 * A10 (24 GB VRAM) | 131072 | Not Supported | 131072 | 131072 | 131072 |
DeepSeek-R1-Distill-Qwen-7B | 1 * A10 (24 GB VRAM) | 131072 | Not Supported | 32768 | 131072 | 131072 |
DeepSeek-R1-Distill-Llama-8B | 1 * A10 (24 GB VRAM) | 131072 | Not Supported | 32768 | 131072 | 131072 |
DeepSeek-R1-Distill-Qwen-14B | 1 * GPU L (48 GB VRAM) | 131072 | Not Supported | 32768 | 131072 | 131072 |
DeepSeek-R1-Distill-Qwen-32B | 2 * GPU L (2 * 48 GB VRAM) | 131072 | Not Supported | 32768 | 131072 | 131072 |
DeepSeek-R1-Distill-Llama-70B | 2 * GU120 (2 * 96 GB VRAM) | 131072 | Not Supported | 32768 | 131072 | 131072 |
For optimal performance and maximum token support, accelerated deployment (BladeLLM or SGLang) is recommended. Note that accelerated deployments only support API calls, while standard deployments support both API calls and WebUI chat interfaces.
If you are deploying DeepSeek-R1 or DeepSeek-V3, you have the following instance options:
The method of calling the model varies depending on the deployment type:
Feature | BladeLLM | SGLang | vLLM | Standard |
---|---|---|---|---|
WebUI | Not Supported. Local WebUI required (BladeLLM GitHub, BladeLLM OSS) | Supported via online debugging. See "Online Debugging" section. | Supported via online debugging. See "Online Debugging" section. | Supported. Access via Model Gallery > Task Management > Deployment Tasks > Service Details > "View WEB Application". WebUI Usage |
Online Debugging | Supported. See "Online Debugging" section. | Supported. See "Online Debugging" section. | Supported. See "Online Debugging" section. | N/A |
API Call | Supported via completions and chat/completions endpoints. |
Supported via completions and chat/completions endpoints. |
Supported via completions and chat/completions endpoints. |
Supported via completions and chat/completions endpoints. Standard deployments also support direct calls. API Call Details |
All deployment methods support HTTP POST requests to the following endpoints:
<EAS_ENDPOINT>/v1/completions
<EAS_ENDPOINT>/v1/chat/completions
Standard deployments also allow direct calls to the <EAS_ENDPOINT>
without any suffixes.
temperature
between 0.5-0.7 (recommended: 0.6) to avoid repetition or incoherence.system prompt
; include all instructions within the user prompt
.Due to the size of DeepSeek-V3 and DeepSeek-R1, deployment can be expensive. Distilled, lighter models can significantly reduce costs. Evaluate your options carefully before deploying to a production environment. Consider the benefits of cloud cost management
/v1
at the end) as the API endpoint URL.By following this guide, you can efficiently deploy and utilize the powerful DeepSeek-V3 and DeepSeek-R1 models on Alibaba Cloud PAI, choosing configurations and optimization techniques that best fit your specific needs and budget.