AMD is making waves in the AI world with the integration of the DeepSeek-V3 model on AMD Instinct™ GPUs. This powerful combination, optimized with SGLang, is poised to accelerate the development of advanced AI applications and experiences. Let's delve into what makes this integration so significant.
DeepSeek-V3 is an open-source, multimodal AI model designed to empower developers with exceptional performance and efficiency. Its ability to process both text and visual data seamlessly sets a new standard for productivity and innovation.
AMD Instinct™ GPU accelerators are ideal for multimodal AI models like DeepSeek-V3, providing the necessary computational resources and memory bandwidth to handle complex text and visual data processing.
AMD ROCm™ software plays a crucial role in the development of DeepSeek-V3. This open software approach for AI strengthens the collaboration between AMD and the AI community, allowing developers to build powerful visual reasoning and understanding applications.
The extensive FP8 support in ROCm™ significantly improves the efficiency of running AI models, especially during inference. FP8 helps to alleviate memory bottlenecks and reduce latency issues, enabling the processing of larger models and batches within existing hardware constraints.
SGLang fully supports the DeepSeek-V3 model inference modes, offering developers the tools they need to get started quickly.
Here’s a simplified guide to getting started with SGLang on AMD Instinct™ GPUs:
docker run -it --ipc=host --cap-add=SYS_PTRACE --network=host \
--device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined \
--group-add video --privileged -w /workspace lmsysorg/sglang:v0.4.2.post3-rocm630
huggingface-cli login
python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3 --port 30000 --tp 8 --trust-remote-code
curl http://localhost:30000/generate \
-H "Content-Type: application/json" \
-d '{ "text": "Once upon a time,", "sampling_params": { "max_new_tokens": 16, "temperature": 0 } }'
export HSA_NO_SCRATCH_RECLAIM=1
python3 -m sglang.bench_one_batch --batch-size 32 --input 128 --output 32 --model deepseek-ai/DeepSeek-V3 --tp 8 --trust-remote-code
server: python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V3 --tp 8 --trust-remote-code
python3 benchmark/gsm8k/bench_sglang.py --num-questions 2000 --parallel 2000 --num-shots 8
If BF16 weights are needed for experimentation, use the provided conversion script:
cd inference
python fp8_cast_bf16.py --input-fp8-hf-path /path/to/fp8_weights --output-bf16-hf-path /path/to/bf16_weights
AMD's collaboration with the DeepSeek team ensures that developers can leverage the DeepSeek-V3 model on AMD Instinct™ GPUs from day one. This provides a broader choice of GPU hardware and an open software stack (ROCm™) for optimized performance and scalability.
In conclusion, the integration of DeepSeek-V3 with AMD Instinct™ GPUs, optimized by SGLang and powered by ROCm software, marks a significant step forward in AI development. This collaboration empowers developers with the tools and resources needed to create cutting-edge AI applications and experiences.