DeepSeek-V3: A Leap Forward in Open-Source AI – Performance, Speed, and Accessibility

DeepSeek AI has officially launched its groundbreaking DeepSeek-V3 model, marking a significant milestone in the advancement of open-source artificial intelligence. This new model boasts impressive performance, enhanced speed, and a commitment to accessible AI for all.

What is DeepSeek-V3?

DeepSeek-V3 represents a new generation of AI models developed by DeepSeek AI. The initial release is now available and open-source, allowing developers and researchers to explore its capabilities. Users can interact with the latest V3 model via the DeepSeek Chat platform. The API service has also been updated to incorporate the new model without requiring any configuration changes. Note that the current version of DeepSeek-V3 does not support multimodal input and output.

Key Features and Improvements

DeepSeek-V3 brings several significant improvements and features to the table:

Performance Benchmarking

DeepSeek-V3 achieves state-of-the-art (SOTA) performance across a range of benchmarks, rivaling top closed-source models:

Knowledge tasks: Significant improvements in areas like MMLU, GPQA, and SimpleQA, approaching the level of Claude-3.5-Sonnet-1022.
Long text: Outperforms other models in benchmarks like DROP, FRAMES, and LongBench v2.
Code: Excels in algorithm-related code scenarios (Codeforces) and approaches Claude-3.5-Sonnet-1022 in engineering code scenarios (SWE-Bench Verified).
Mathematics: Demonstrates superior performance in mathematical competitions like AIME 2024 and CNMO 2024, surpassing both open-source and closed-source alternatives.
Chinese Language Understanding: Exhibits performance comparable to Qwen2.5-72B in education-related evaluations and excels on fact-based tasks.

Enhanced Speed

Thanks to algorithmic and engineering innovations, DeepSeek-V3 significantly boosts generation speed, reaching 60 tokens per second (TPS), a threefold increase compared to the V2.5 model. The increase enhances usability by providing a much more fluid and responsive user experience.

Model Architecture

DeepSeek-V3 is a self-developed Mixture-of-Experts (MoE) model that uses 671 billion parameters, with 37 billion parameters being active. It has been pre-trained on 14.8 trillion tokens. The paper detailing the architecture and training process is available on GitHub.

API Service and Pricing

Alongside the release of DeepSeek-V3, DeepSeek AI has adjusted its API service pricing:

Input tokens: ¥0.5 per million tokens (cache hit) / ¥2 per million tokens (cache miss)
Output tokens: ¥8 per million tokens

To encourage early adoption, DeepSeek is offering a promotional pricing period until February 8, 2025:

Input tokens: ¥0.1 per million tokens (cache hit) / ¥1 per million tokens (cache miss)
Output tokens: ¥2 per million tokens

Both new and existing users can take advantage of this discounted pricing.

Open Source and Local Deployment

DeepSeek-V3 is trained using FP8 and provides open-source native FP8 weights. This allows for efficient inference and deployment, supported by integrations with SGLang and LMDeploy for native FP8 inference, and TensorRT-LLM and MindIE for BF16 inference. Conversion scripts from FP8 to BF16 are also available to facilitate community adaptation and application development.

Model weights and detailed deployment information are available on Hugging Face.

The Impact of DeepSeek-V3

The release of DeepSeek-V3 strengthens DeepSeek's commitment to open-source AI. By open-sourcing its model, DeepSeek promotes collaboration, innovation, and access to advanced AI technology. By narrowing the gap between open-source and closed-source models, DeepSeek accelerates the democratization of AI.

Future Development

DeepSeek-V3 represents the groundworks for future models featuring advanced capabilities. DeepSeek intends to build on the DeepSeek-V3 base model with enhanced functionalities such as deep reasoning and multimodal processing. They will continue sharing their discoveries with the community.

Stay Connected

To stay updated with DeepSeek AI's latest developments, updates and contribute to the community, you can visit the DeepSeek GitHub page.

This article provides an overview of the DeepSeek-V3 release, emphasizing its features, performance improvements, and the impact it has on the open-source AI landscape. By understanding the implications of this groundbreaking model, developers and researchers can leverage its capabilities and contribute to the evolution of AI technology.

. . .

Song Key & BPM Finder | Tunebat

Find the key & BPM of any song for free using the Tunebat analyzer. Upload your music for quick results.

Gamma: AI Design and Presentations

Jul 16, 2024 ... Enter Gamma, which promises to create "beautiful presentations, documents, and websites. No design or coding skills required." Perfect. That's ...

Relationship Headcanon Generator

Relationship Headcanon Generator. Character A should have left Character B on that street corner where they were standing. BUT THEY DIDN'T!

deepseek-ai/DeepSeek-V3 - GitHub

For developers looking to dive deeper, we recommend exploring README_WEIGHTS.md for details on the Main Model weights and the Multi-Token Prediction (MTP) ...

Generating Images | Ideogram

Jan 8, 2025 ... Ideogram is a web application that allows you to browse and generate images from text. Within the user interface, you can generate images, browse and filter ...