DeepSeek-R1 is a cutting-edge reasoning model designed to excel in complex tasks such as scientific reasoning, language understanding, and coding. With its impressive 671 billion parameters (37 billion active) and a 128k context length, DeepSeek-R1 is a valuable asset for developers and organizations looking to leverage advanced AI capabilities. This article provides a comprehensive guide on how to utilize DeepSeek-R1 with Azure AI Foundry, covering deployment, inference, and best practices.
DeepSeek-R1 builds upon the progress of earlier reasoning-focused models by incorporating a step-by-step training process. It refines the Chain-of-Thought (CoT) reasoning approach by combining reinforcement learning (RL) with fine-tuning on carefully curated datasets. The model evolved from DeepSeek-R1-Zero, which relied solely on RL and exhibited strong reasoning skills but suffered from output readability and language consistency issues.
To overcome these limitations, DeepSeek-R1 integrates a small amount of cold-start data and follows a refined training pipeline that blends reasoning-oriented RL with supervised fine-tuning. This results in a model that achieves state-of-the-art performance on reasoning benchmarks.
For more details, refer to the DeepSeek-R1 model card.
Before you start using DeepSeek-R1 with Azure AI Foundry, ensure you have the following:
DeepSeek-R1 can be deployed to serverless API endpoints with pay-as-you-go billing. This approach allows you to consume the model as an API without the need for hosting it on your subscription, while maintaining enterprise-grade security and compliance.
If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to deploy the model as a serverless API.
You can consume predictions from DeepSeek-R1 using the azure-ai-inference
package with Python. To install this package, make sure you have:
pip
.https://your-host-name.your-azure-region.inference.ai.azure.com
)Install the Azure AI inference package using the following command:
pip install azure-ai-inference
For more information, refer to the Azure AI inference package and reference.
This section covers how to use the Azure AI model inference API with a chat completions model for chat interactions.
First, set up the client to consume the model. The example code below retrieves the endpoint URL and key from environment variables:
import os
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential
client = ChatCompletionsClient(
endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
credential=AzureKeyCredential(os.environ["AZURE_INFERENCE_CREDENTIAL"]),
)
The /info
route provides information about the model deployed to the endpoint. Here's how to retrieve this information:
model_info = client.get_model_info()
print("Model name:", model_info.model_name)
print("Model type:", model_info.model_type)
print("Model provider name:", model_info.model_provider_name)
This will output the model's name, type, and provider.
Here's an example of how to create a basic chat completions request:
from azure.ai.inference.models import SystemMessage, UserMessage
response = client.complete(
messages=[
SystemMessage(content="You are a helpful assistant."),
UserMessage(content="How many languages are in the world?"),
],
)
print("Response:", response.choices[0].message.content)
print("Model:", response.model)
print("Usage:")
print("\tPrompt tokens:", response.usage.prompt_tokens)
print("\tTotal tokens:", response.usage.total_tokens)
print("\tCompletion tokens:", response.usage.completion_tokens)
The response includes the model's answer and usage statistics, such as the number of tokens used.
DeepSeek-R1 is designed to provide reasoning behind its answers. This reasoning is included within <think>
and </think>
tags in the response content.
import re
response = client.complete(
messages=[
SystemMessage(content="You are a helpful assistant."),
UserMessage(content="How many languages are in the world?"),
],
)
match = re.match(r"<think>(.*?)</think>(.*)", response.choices[0].message.content, re.DOTALL)
print("Response:")
if match:
print("\tThinking:", match.group(1))
print("\tAnswer:", match.group(2))
else:
print("\tAnswer:", response.choices[0].message.content)
print("Model:", response.model)
print("Usage:")
print("\tPrompt tokens:", response.usage.prompt_tokens)
print("\tTotal tokens:", response.usage.total_tokens)
print("\tCompletion tokens:", response.usage.completion_tokens)
This code extracts the reasoning content from the response, allowing you to understand the model's thought process.
To improve the user experience, you can stream the generated content as it becomes available. This is especially useful for long completions. Enable streaming by setting stream=True
when calling the model:
result = client.complete(
messages=[
SystemMessage(content="You are a helpful assistant."),
UserMessage(content="How many languages are in the world?"),
],
temperature=0,
top_p=1,
max_tokens=2048,
stream=True,
)
def print_stream(result):
"""Prints the chat completion with streaming."""
for update in result:
if update.choices:
print(update.choices[0].delta.content, end="")
print_stream(result)
This approach returns content as data-only server-sent events, allowing you to process the completion as it's being generated.
The Azure AI model inference API supports Azure AI content safety. When enabled, inputs and outputs are processed through classification models to detect and prevent the output of harmful content.
from azure.ai.inference.models import AssistantMessage, UserMessage, SystemMessage
from azure.core.exceptions import HttpResponseError
try:
response = client.complete(
messages=[
SystemMessage(content="You are an AI assistant that helps people find information."),
UserMessage(content="Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."),
]
)
print(response.choices[0].message.content)
except HttpResponseError as ex:
if ex.status_code == 400:
response = ex.response.json()
if isinstance(response, dict) and "error" in response:
print(f"Your request triggered an {response['error']['code']} error:\n\t {response['error']['message']}")
else:
raise
else:
raise
If harmful content is detected, the system takes action based on the configured content filtering settings.
DeepSeek-R1 is a powerful reasoning model that can significantly enhance your AI-driven applications through Azure AI Foundry. By following the steps outlined in this guide, you can effectively deploy, infer, and manage DeepSeek-R1 to unlock its full potential, ensuring enterprise-grade security and compliance along the way.
Remember to explore the related Azure samples in various languages for more practical examples. Always monitor your resource usage and manage costs effectively by referring to the cost management documentation.