AMD Instinct™ GPUs Power DeepSeek-V3 AI with SGLang

AMD Instinct GPUs and DeepSeek-V3: Revolutionizing AI Development with SGLang

AMD is making waves in the AI world with the integration of the DeepSeek-V3 model on AMD Instinct™ GPUs. This powerful combination, optimized with SGLang, is poised to accelerate the development of advanced AI applications and experiences. Let's delve into what makes this integration so significant.

What is DeepSeek-V3?

DeepSeek-V3 is an open-source, multimodal AI model designed to empower developers with exceptional performance and efficiency. Its ability to process both text and visual data seamlessly sets a new standard for productivity and innovation.

Key Features of DeepSeek-V3:

Mixture-of-Experts (MoE) Architecture: With 671B total parameters and 37B activated per token, DeepSeek-V3 balances size with efficiency.
Multi-head Latent Attention (MLA): Enhances performance.
DeepSeekMoE Architecture: Further contributes to efficient inference and training.
Auxiliary-Loss-Free Strategy: Enhances load balancing.
Multi-Token Prediction: Improves performance.
Multimodal Capabilities: Processes text and visual data simultaneously.
Strong Performance: Excels in benchmarks, particularly in math and coding tasks.

AMD Instinct™ GPUs: The Driving Force

AMD Instinct™ GPU accelerators are ideal for multimodal AI models like DeepSeek-V3, providing the necessary computational resources and memory bandwidth to handle complex text and visual data processing.

The Role of ROCm™ Software

AMD ROCm™ software plays a crucial role in the development of DeepSeek-V3. This open software approach for AI strengthens the collaboration between AMD and the AI community, allowing developers to build powerful visual reasoning and understanding applications.

FP8 Support: Boosting Efficiency

The extensive FP8 support in ROCm™ significantly improves the efficiency of running AI models, especially during inference. FP8 helps to alleviate memory bottlenecks and reduce latency issues, enabling the processing of larger models and batches within existing hardware constraints.

Memory Bottleneck Reduction: FP8 reduces memory requirements.
Latency Improvement: Enables faster data transmission and calculations.
Efficient Training and Inference: Streamlines the AI model lifecycle.

Inference with SGLang on AMD Instinct™ GPUs

SGLang fully supports the DeepSeek-V3 model inference modes, offering developers the tools they need to get started quickly.

Getting Started with SGLang

Here’s a simplified guide to getting started with SGLang on AMD Instinct™ GPUs:

Build Docker Image: Use the generic build process for a ROCm Docker image.

Launch Docker Container:

docker run -it --ipc=host --cap-add=SYS_PTRACE --network=host \
--device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined \
--group-add video --privileged -w /workspace lmsysorg/sglang:v0.4.2.post3-rocm630

Log in to Hugging Face: Use the CLI: huggingface-cli login

Start the SGLang Server:

python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3  --port 30000 --tp 8 --trust-remote-code

Generate Text: Open another terminal and send requests to generate text:

curl http://localhost:30000/generate \
-H "Content-Type: application/json" \
-d '{ "text": "Once upon a time,", "sampling_params": { "max_new_tokens": 16, "temperature": 0 } }'

Benchmark:

export HSA_NO_SCRATCH_RECLAIM=1
python3 -m sglang.bench_one_batch --batch-size 32 --input 128 --output 32 --model deepseek-ai/DeepSeek-V3 --tp 8 --trust-remote-code
server: python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V3 --tp 8 --trust-remote-code
python3 benchmark/gsm8k/bench_sglang.py --num-questions 2000 --parallel 2000 --num-shots 8

Converting FP8 Weights to BF16

If BF16 weights are needed for experimentation, use the provided conversion script:

cd inference
python fp8_cast_bf16.py --input-fp8-hf-path /path/to/fp8_weights --output-bf16-hf-path /path/to/bf16_weights

AMD and DeepSeek Collaboration: Day 0 Support

AMD's collaboration with the DeepSeek team ensures that developers can leverage the DeepSeek-V3 model on AMD Instinct™ GPUs from day one. This provides a broader choice of GPU hardware and an open software stack (ROCm™) for optimized performance and scalability.

Key Benefits of the Collaboration:

Optimized Performance: AMD is committed to optimizing DeepSeek-V3 performance with CK-tile based kernels on AMD Instinct™ GPUs.
Open-Source Commitment: AMD continues to collaborate with open-source model providers to accelerate AI innovation.

Additional Resources

ROCm AI Developer Hub: ROCm AI Developer Hub
AMD ROCm™ Software: AMD ROCm™ Software
AMD Instinct™ Accelerators: AMD Instinct™ Accelerators
DeepSeek-V3 on Hugging Face: DeepSeek-V3 on Hugging Face
DeepSeek Official Chat Platform: chat.deepseek.com
DeepSeek’s OpenAI-Compatible API: platform.deepseek.com

In conclusion, the integration of DeepSeek-V3 with AMD Instinct™ GPUs, optimized by SGLang and powered by ROCm software, marks a significant step forward in AI development. This collaboration empowers developers with the tools and resources needed to create cutting-edge AI applications and experiences.

. . .

FACEIT ANALYZER TOOLS by wherePANDA - Microsoft Edge Addons

Jan 9, 2025 ... Enhance your FACEIT and Steam experience with this powerful profiling tool designed for CS2 players, FACEIT competitors, and Steam users ...

Consensus statements on the current landscape of artificial ...

Apr 17, 2024 ... The consensus process led by the ASGE AI Task Force and experts from various disciplines has shed light on the potential of AI in endoscopy ...

Undetectable AI: AI Detector & AI Checker for ChatGPT & More

Paste your text into our AI Detector to check if it will be flagged as AI-generated content. Get results from major AI Checkers instantly with just one click.

WizTree - The Fastest Disk Space Analyzer

Official WizTree web site. WizTree is the fastest disk space analyzer available for Windows. It reads the Master File Table (MFT) directly from the disk, ...

The best Favicon Generator (completely free) - favicon.io

The only favicon generator you need for your next project. Quickly generate your favicon from text, image, or choose from hundreds of emojis.