In a groundbreaking move, DeepSeek has launched its latest model, DeepSeek-V3, marking a significant leap forward in the realm of open-source artificial intelligence. This newly released model is not only available for immediate use but also comes with its source code open to the public, fostering collaboration and innovation within the AI community.
Today marks the release of the first version of our new DeepSeek-V3 series model, which is also open source. You can interact with the latest V3 model by logging into the official website chat.deepseek.com. The API service has been updated simultaneously, and the interface configuration does not need to be changed. The current version of DeepSeek-V3 does not support multimodal input and output.
DeepSeek-V3 is a self-developed MoE model with 671B parameters and 37B activations, pre-trained on 14.8T tokens.
The DeepSeek-V3 has demonstrated remarkable performance across a multitude of benchmarks, rivaling and even surpassing leading closed-source models like GPT-4o and Claude-3.5-Sonnet. Here’s a breakdown of its key achievements:
The paper outlining the architecture and training of DeepSeek-V3 is available on GitHub, offering valuable insights into its capabilities.
One the key highlights of DeepSeek-V3 is its significantly enhanced generation speed. Through algorithmic and engineering innovations, the model's text generation speed has increased by three times, from 20 TPS to 60 TPS. This improvement ensures a more fluid and responsive experience for users, streamlining interactions and workflows.
As DeepSeek-V3 rolls out, there are some adjustments to the API service pricing:
However, DeepSeek is offering a promotional period until February 8, 2025, where users can enjoy the familiar pricing of:
This introductory offer is available to both new and existing users, making it an opportune time to explore the capabilities of the DeepSeek-V3 model.
DeepSeek-V3 leverages FP8 training, providing native FP8 weights in its open-source release. With support from the open-source community, tools like SGLang and LMDeploy support native FP8 inference for V3 models. We also provide a conversion script from FP8 to BF16 to facilitate community adaptation and expanded application scenarios.
You can find more information about the model weights and local deployment on Hugging Face.
DeepSeek's commitment to open source principles aims to democratize access to advanced AI technologies. By releasing DeepSeek-V3, they are enabling developers, researchers, and organizations to leverage this powerful model and contribute to its further development.
This release represents a step forward in bridging the gap between open-source and closed-source AI capabilities, driving innovation and collaboration across the AI landscape. DeepSeek aims to continually enhance the DeepSeek-V3 base model with in-depth thinking, multimodality, and many other functions, and continue to share our latest exploration results with the community.
DeepSeek continues to actively engage with the AI community through various channels:
By fostering these connections, DeepSeek encourages collaboration and accelerates the evolution of AGI.