DeepSeek AI has recently unveiled its latest model series, DeepSeek-V3, marking a significant milestone in the advancement of open-source artificial intelligence. The initial version of DeepSeek-V3 is now available, boasting impressive performance benchmarks and a commitment to open accessibility. This article delves into the key features, capabilities, and implications of this groundbreaking release.
DeepSeek-V3 is a Mixture of Experts (MoE) model with 671 billion parameters, of which 37 billion are active. It has been pre-trained on a massive 14.8 trillion token dataset. This model is designed to compete with and even surpass leading closed-source models in various benchmarks, offering a powerful tool for developers, researchers, and businesses.
Initial evaluations reveal that DeepSeek-V3 surpasses other open-source models like Qwen2.5-72B and Llama-3.1-405B in numerous assessments. Remarkably, its performance closely rivals that of top-tier closed-source models such as GPT-4o and Claude-3.5-Sonnet.
Here’s a breakdown of DeepSeek-V3's performance across different categories:
These results highlight DeepSeek-V3's well-rounded capabilities and its potential to excel in diverse applications.
One of the standout features of DeepSeek-V3 is its significantly improved generation speed. Through algorithmic and engineering innovations, the model's output speed has tripled, increasing from 20 tokens per second (TPS) to 60 TPS compared to the V2.5 model. This enhancement delivers a much smoother and faster user experience, making it ideal for real-time applications.
The DeepSeek-V3 API is now live, providing developers with access to this powerful model. The pricing structure is as follows:
To encourage adoption, DeepSeek AI is offering a special introductory pricing period lasting 45 days, until February 8, 2025. During this period, the API service will be available at the following discounted rates:
Both new and existing users who register during this promotional period can take advantage of these lower prices. For more details on pricing and usage, refer to the DeepSeek API documentation.
DeepSeek AI emphasizes its dedication to the open-source community by offering DeepSeek-V3 with open-source weights. The model is trained using FP8, and the native FP8 weights are available for download. This initiative allows researchers and developers to explore, customize, and deploy the model according to their specific requirements.
Furthermore, the open-source community has quickly embraced DeepSeek-V3, with SGLang and LMDeploy supporting native FP8 inference. TensorRT-LLM and MindIE have also implemented BF16 inference. To facilitate broader adoption, DeepSeek AI provides conversion scripts for FP8 to BF16.
Model weights and additional deployment information can be found on Hugging Face.
DeepSeek-V3's impressive capabilities make it suitable for a wide range of applications, including:
DeepSeek AI encourages users to engage with the community through various channels:
The release of DeepSeek-V3 marks a significant advancement in open-source AI, offering a powerful and versatile model that rivals top-tier closed-source alternatives. With its exceptional performance, enhanced speed, and open-source availability, DeepSeek-V3 empowers developers and researchers to explore new frontiers in artificial intelligence. DeepSeek AI's dedication to open-source principles and continuous improvement promises a bright future for the DeepSeek-V3 series and the broader AI community.