DeepSeek AI has officially launched its groundbreaking DeepSeek-V3 model, marking a significant milestone in the advancement of open-source artificial intelligence. This new model boasts impressive performance, enhanced speed, and a commitment to accessible AI for all.
DeepSeek-V3 represents a new generation of AI models developed by DeepSeek AI. The initial release is now available and open-source, allowing developers and researchers to explore its capabilities. Users can interact with the latest V3 model via the DeepSeek Chat platform. The API service has also been updated to incorporate the new model without requiring any configuration changes. Note that the current version of DeepSeek-V3 does not support multimodal input and output.
DeepSeek-V3 brings several significant improvements and features to the table:
DeepSeek-V3 achieves state-of-the-art (SOTA) performance across a range of benchmarks, rivaling top closed-source models:
Thanks to algorithmic and engineering innovations, DeepSeek-V3 significantly boosts generation speed, reaching 60 tokens per second (TPS), a threefold increase compared to the V2.5 model. The increase enhances usability by providing a much more fluid and responsive user experience.
DeepSeek-V3 is a self-developed Mixture-of-Experts (MoE) model that uses 671 billion parameters, with 37 billion parameters being active. It has been pre-trained on 14.8 trillion tokens. The paper detailing the architecture and training process is available on GitHub.
Alongside the release of DeepSeek-V3, DeepSeek AI has adjusted its API service pricing:
To encourage early adoption, DeepSeek is offering a promotional pricing period until February 8, 2025:
Both new and existing users can take advantage of this discounted pricing.
DeepSeek-V3 is trained using FP8 and provides open-source native FP8 weights. This allows for efficient inference and deployment, supported by integrations with SGLang and LMDeploy for native FP8 inference, and TensorRT-LLM and MindIE for BF16 inference. Conversion scripts from FP8 to BF16 are also available to facilitate community adaptation and application development.
Model weights and detailed deployment information are available on Hugging Face.
The release of DeepSeek-V3 strengthens DeepSeek's commitment to open-source AI. By open-sourcing its model, DeepSeek promotes collaboration, innovation, and access to advanced AI technology. By narrowing the gap between open-source and closed-source models, DeepSeek accelerates the democratization of AI.
DeepSeek-V3 represents the groundworks for future models featuring advanced capabilities. DeepSeek intends to build on the DeepSeek-V3 base model with enhanced functionalities such as deep reasoning and multimodal processing. They will continue sharing their discoveries with the community.
To stay updated with DeepSeek AI's latest developments, updates and contribute to the community, you can visit the DeepSeek GitHub page.
This article provides an overview of the DeepSeek-V3 release, emphasizing its features, performance improvements, and the impact it has on the open-source AI landscape. By understanding the implications of this groundbreaking model, developers and researchers can leverage its capabilities and contribute to the evolution of AI technology.