How to use DeepSeek-R1 reasoning model with Azure AI Foundry - Azure AI Foundry

Harnessing the Power of DeepSeek-R1 with Azure AI Foundry

DeepSeek-R1 is a cutting-edge reasoning model designed to excel in complex tasks such as scientific reasoning, language understanding, and coding. With its impressive 671 billion parameters (37 billion active) and a 128k context length, DeepSeek-R1 is a valuable asset for developers and organizations looking to leverage advanced AI capabilities. This article provides a comprehensive guide on how to utilize DeepSeek-R1 with Azure AI Foundry, covering deployment, inference, and best practices.

Understanding DeepSeek-R1

DeepSeek-R1 builds upon the progress of earlier reasoning-focused models by incorporating a step-by-step training process. It refines the Chain-of-Thought (CoT) reasoning approach by combining reinforcement learning (RL) with fine-tuning on carefully curated datasets. The model evolved from DeepSeek-R1-Zero, which relied solely on RL and exhibited strong reasoning skills but suffered from output readability and language consistency issues.

To overcome these limitations, DeepSeek-R1 integrates a small amount of cold-start data and follows a refined training pipeline that blends reasoning-oriented RL with supervised fine-tuning. This results in a model that achieves state-of-the-art performance on reasoning benchmarks.

Key Features of DeepSeek-R1:
- Large Parameter Size: 671B total parameters with 37B active parameters enables complex reasoning.
- Extensive Context Length: 128k context length allows for processing of large amounts of information.
- Reinforcement Learning (RL): Combines RL with fine-tuning to improve reasoning skills.
- Supervised Fine-Tuning: Fine-tuned on curated datasets to enhance output quality.
- State-of-the-Art Performance: Achieves top-tier performance on reasoning benchmarks.

For more details, refer to the DeepSeek-R1 model card.

Prerequisites for Using DeepSeek-R1 with Azure AI Foundry

Before you start using DeepSeek-R1 with Azure AI Foundry, ensure you have the following:

Azure Subscription: You'll need an active Azure subscription. If you don't have one, you can create a free Azure account.
Azure AI Studio Access: Access to Azure AI Studio is required to deploy and manage models.
Model Deployment: Ensure that the DeepSeek-R1 model is deployed. Follow the instructions outlined in the official documentation to set up your model deployment.
Inference Package: The appropriate inference package installed, depending on your language of choice (Python, JavaScript, or C#).

Deployment to Serverless APIs

DeepSeek-R1 can be deployed to serverless API endpoints with pay-as-you-go billing. This approach allows you to consume the model as an API without the need for hosting it on your subscription, while maintaining enterprise-grade security and compliance.

Benefits of Serverless API Deployment:
- Pay-as-you-go Billing: Only pay for what you use.
- No Quota Required: Deployment to a serverless API endpoint does not require quota from your subscription.
- Enterprise Security: Maintains the security and compliance standards required by organizations.

If your model isn't deployed already, use the Azure AI Studio, Azure Machine Learning SDK for Python, the Azure CLI, or ARM templates to deploy the model as a serverless API.

Inference Package Installation

You can consume predictions from DeepSeek-R1 using the azure-ai-inference package with Python. To install this package, make sure you have:

Python 3.8 or later, including pip.
The endpoint URL for your deployment. (e.g., https://your-host-name.your-azure-region.inference.ai.azure.com)
Authentication credentials: Either a key or Microsoft Entra ID credentials.

Install the Azure AI inference package using the following command:

pip install azure-ai-inference

For more information, refer to the Azure AI inference package and reference.

Working with Chat Completions

This section covers how to use the Azure AI model inference API with a chat completions model for chat interactions.

Creating a Client

First, set up the client to consume the model. The example code below retrieves the endpoint URL and key from environment variables:

import os
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential

client = ChatCompletionsClient(
    endpoint=os.environ["AZURE_INFERENCE_ENDPOINT"],
    credential=AzureKeyCredential(os.environ["AZURE_INFERENCE_CREDENTIAL"]),
)

Retrieving Model Capabilities

The /info route provides information about the model deployed to the endpoint. Here's how to retrieve this information:

model_info = client.get_model_info()

print("Model name:", model_info.model_name)
print("Model type:", model_info.model_type)
print("Model provider name:", model_info.model_provider_name)

This will output the model's name, type, and provider.

Creating a Basic Chat Completion Request

Here's an example of how to create a basic chat completions request:

from azure.ai.inference.models import SystemMessage, UserMessage

response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="How many languages are in the world?"),
    ],
)

print("Response:", response.choices[0].message.content)
print("Model:", response.model)
print("Usage:")
print("\tPrompt tokens:", response.usage.prompt_tokens)
print("\tTotal tokens:", response.usage.total_tokens)
print("\tCompletion tokens:", response.usage.completion_tokens)

The response includes the model's answer and usage statistics, such as the number of tokens used.

Understanding Reasoning with DeepSeek-R1

DeepSeek-R1 is designed to provide reasoning behind its answers. This reasoning is included within <think> and </think> tags in the response content.

import re

response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="How many languages are in the world?"),
    ],
)

match = re.match(r"<think>(.*?)</think>(.*)", response.choices[0].message.content, re.DOTALL)
print("Response:")

if match:
    print("\tThinking:", match.group(1))
    print("\tAnswer:", match.group(2))
else:
    print("\tAnswer:", response.choices[0].message.content)

print("Model:", response.model)
print("Usage:")
print("\tPrompt tokens:", response.usage.prompt_tokens)
print("\tTotal tokens:", response.usage.total_tokens)
print("\tCompletion tokens:", response.usage.completion_tokens)

This code extracts the reasoning content from the response, allowing you to understand the model's thought process.

Streaming Content

To improve the user experience, you can stream the generated content as it becomes available. This is especially useful for long completions. Enable streaming by setting stream=True when calling the model:

result = client.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="How many languages are in the world?"),
    ],
    temperature=0,
    top_p=1,
    max_tokens=2048,
    stream=True,
)

def print_stream(result):
    """Prints the chat completion with streaming."""
    for update in result:
        if update.choices:
            print(update.choices[0].delta.content, end="")

print_stream(result)

This approach returns content as data-only server-sent events, allowing you to process the completion as it's being generated.

Applying Content Safety

The Azure AI model inference API supports Azure AI content safety. When enabled, inputs and outputs are processed through classification models to detect and prevent the output of harmful content.

from azure.ai.inference.models import AssistantMessage, UserMessage, SystemMessage
from azure.core.exceptions import HttpResponseError

try:
    response = client.complete(
        messages=[
            SystemMessage(content="You are an AI assistant that helps people find information."),
            UserMessage(content="Chopping tomatoes and cutting them into cubes or wedges are great ways to practice your knife skills."),
        ]
    )
    print(response.choices[0].message.content)
except HttpResponseError as ex:
    if ex.status_code == 400:
        response = ex.response.json()
        if isinstance(response, dict) and "error" in response:
            print(f"Your request triggered an {response['error']['code']} error:\n\t {response['error']['message']}")
        else:
            raise
    else:
        raise

If harmful content is detected, the system takes action based on the configured content filtering settings.

Conclusion

DeepSeek-R1 is a powerful reasoning model that can significantly enhance your AI-driven applications through Azure AI Foundry. By following the steps outlined in this guide, you can effectively deploy, infer, and manage DeepSeek-R1 to unlock its full potential, ensuring enterprise-grade security and compliance along the way.

Remember to explore the related Azure samples in various languages for more practical examples. Always monitor your resource usage and manage costs effectively by referring to the cost management documentation.

. . .

Hindi to English Translate on the App Store

Oct 17, 2024 ... This Free translator can quickly translate from Hindi to English and English to Hindi (हिन्दी-अंग्रेजी अनुवादक) words as well as complete sentences.

Poem Generator

Write a poem inspired by your input. We'll help you with devices such as counting syllables, finding synonyms and rhyming words.

Free Glitch Text Generator | Quicktools by Picsart

Use the Picsart free glitch text generator to create unique and weird glitch text effects for social media, blogs, websites, and more.

Microsoft Bing Search - Apps on Google Play

Microsoft Bing helps you find trusted search results fast, tracks topics and trending stories that matter to you, and gives you control of your privacy.

Maze Generator - Codebox Software

Dec 28, 2021 ... Here is an online maze generator that can create mazes using square, triangular, hexagonal or circular grids.