DeepSeek has officially launched its groundbreaking DeepSeek-V3 model, marking a significant advancement in open-source artificial intelligence. This new model aims to redefine the landscape of AI capabilities, challenging even leading closed-source models like GPT-4o and Claude-3.5-Sonnet.
DeepSeek-V3 is a self-developed Mixture-of-Experts (MoE) model featuring 671 billion parameters, with 37 billion parameters activated during use. It was pre-trained on a massive 14.8 trillion tokens. This extensive training has enabled the model to perform exceptionally well across various benchmarks.
DeepSeek-V3 has demonstrated impressive performance across a range of benchmarks, outperforming other open-source models and rivaling top-tier, closed-source models.
One of the most notable improvements in DeepSeek-V3 is its generation speed. Innovations in algorithms and engineering have boosted the token generation rate from 20 tokens per second (TPS) to an impressive 60 TPS, providing users with a much faster and smoother experience compared to DeepSeek-V2.5.
With the release of DeepSeek-V3, there have been adjustments to the API service pricing to reflect the enhanced performance and speed. Model API pricing is adjusted to ¥0.5/2 per million input tokens (cached/uncached) and ¥8 per million output tokens. To allow users to experience DeepSeek-V3 there is a promotional period until February 8, 2025. Until Feb 8, 2025, DeepSeek-V3 API service pricing is ¥0.1/1 per million input tokens (cached/uncached) and ¥2 per million output tokens.
DeepSeek-V3 is trained using FP8 and has open-sourced its native FP8 weights. This commitment to open source facilitates broader adoption and customization within the AI community.
SGLang and LMDeploy support native FP8 inference for V3 models, while TensorRT-LLM and MindIE have implemented BF16 inference.
For model weights and local deployment details, refer to the DeepSeek-V3-Base on Hugging Face.
DeepSeek's dedication to open-source principles and long-term vision aims to democratize AGI (Artificial General Intelligence) technology. The introduction of DeepSeek-V3 represents significant progress in narrowing the capability gap between open and closed-source models. DeepSeek plans to continue enhancing the DeepSeek-V3 base model with deeper reasoning and multimodal capabilities, sharing their advancements with the community.
To experience the DeepSeek-V3 model, visit chat.deepseek.com and begin interacting with the latest version. The API service has been updated, and no configuration changes are needed to start using the new model.
The launch of DeepSeek-V3 marks a significant milestone in AI development, showcasing the potential of open-source models to achieve and even surpass the capabilities of proprietary systems.
This article seeks to bring awareness to the launch of the DeepSeek-V3 model. For previous information about the DeepSeek models you can read about the DeepSeek-V2.5 Release.