DeepSeek-V3: A New Era of Open-Source AI Model Revolutionizing Performance and Speed

In a groundbreaking move, DeepSeek has launched its latest model, DeepSeek-V3, marking a significant leap forward in the realm of open-source artificial intelligence. This newly released model is not only available for immediate use but also comes with its source code open to the public, fostering collaboration and innovation within the AI community.

Unleashing the Power of DeepSeek-V3

Today marks the release of the first version of our new DeepSeek-V3 series model, which is also open source. You can interact with the latest V3 model by logging into the official website chat.deepseek.com. The API service has been updated simultaneously, and the interface configuration does not need to be changed. The current version of DeepSeek-V3 does not support multimodal input and output.

DeepSeek-V3's Performance Benchmarks

DeepSeek-V3 is a self-developed MoE model with 671B parameters and 37B activations, pre-trained on 14.8T tokens.

The DeepSeek-V3 has demonstrated remarkable performance across a multitude of benchmarks, rivaling and even surpassing leading closed-source models like GPT-4o and Claude-3.5-Sonnet. Here’s a breakdown of its key achievements:

Knowledge: Significant improvement in knowledge-based tasks (MMLU, MMLU-Pro, GPQA, SimpleQA), approaching the level of Claude-3.5-Sonnet-1022.
Long Text: Outperforms other models in long-text evaluations such as DROP, FRAMES, and LongBench v2.
Code: Excels in algorithmic coding scenarios (Codeforces) and nears Claude-3.5-Sonnet-1022 in engineering code scenarios (SWE-Bench Verified).
Math: Significantly outperforms all open and closed source models in the American Mathematics Competition (AIME 2024, MATH) and the National High School Mathematics League (CNMO 2024).
Chinese Language: Performance similar to Qwen2.5-72B in educational evaluations like C-Eval and pronoun disambiguation, with superior performance in factual knowledge C-SimpleQA.

The paper outlining the architecture and training of DeepSeek-V3 is available on GitHub, offering valuable insights into its capabilities.

Faster Than Ever: 3x Speed Boost

One the key highlights of DeepSeek-V3 is its significantly enhanced generation speed. Through algorithmic and engineering innovations, the model's text generation speed has increased by three times, from 20 TPS to 60 TPS. This improvement ensures a more fluid and responsive experience for users, streamlining interactions and workflows.

API Service Pricing and Access

As DeepSeek-V3 rolls out, there are some adjustments to the API service pricing:

Input tokens: ¥0.5 per million tokens (cache hit) / ¥2 per million tokens (cache miss)
Output tokens: ¥8 per million tokens

However, DeepSeek is offering a promotional period until February 8, 2025, where users can enjoy the familiar pricing of:

Input tokens: ¥0.1 per million tokens (cache hit) / ¥1 per million tokens (cache miss)
Output tokens: ¥2 per million tokens

This introductory offer is available to both new and existing users, making it an opportune time to explore the capabilities of the DeepSeek-V3 model.

Open Source and Local Deployment

DeepSeek-V3 leverages FP8 training, providing native FP8 weights in its open-source release. With support from the open-source community, tools like SGLang and LMDeploy support native FP8 inference for V3 models. We also provide a conversion script from FP8 to BF16 to facilitate community adaptation and expanded application scenarios.

You can find more information about the model weights and local deployment on Hugging Face.

A Commitment to Open Source and AGI

DeepSeek's commitment to open source principles aims to democratize access to advanced AI technologies. By releasing DeepSeek-V3, they are enabling developers, researchers, and organizations to leverage this powerful model and contribute to its further development.

This release represents a step forward in bridging the gap between open-source and closed-source AI capabilities, driving innovation and collaboration across the AI landscape. DeepSeek aims to continually enhance the DeepSeek-V3 base model with in-depth thinking, multimodality, and many other functions, and continue to share our latest exploration results with the community.

DeepSeek continues to actively engage with the AI community through various channels:

GitHub: (https://github.com/deepseek-ai)
Discord: (https://discord.gg/Tc7c45Zzu5)
Twitter: (https://twitter.com/deepseek_ai)

By fostering these connections, DeepSeek encourages collaboration and accelerates the evolution of AGI.

. . .

Glitch Text Generator (copy and paste) ― LingoJam

Glitch Text Generator (copy and paste) AI Girl Generator (completely free, no signup, no limits) Craziness Level: Send Generating glitch text.

Perplexity - Ask Anything on the App Store

This free app syncs across devices and leverages the power of AI like OpenAI's GPT-4o, Anthropic's Claude 3.5 Sonnet and Haiku, and xAi's Grok-2. Your smarter ...

Hailuo AI: AI Video by Minimax - Apps on Google Play

Nov 24, 2024 ... Hailuo AI: AI Video by Minimax simplifies the process, allowing you to produce high-quality, engaging videos in just a few steps.

John Schulman on X: "I shared the following note with my OpenAI ...

Aug 6, 2024 ... I shared the following note with my OpenAI colleagues today: I've made the difficult decision to leave OpenAI. This choice stems from my ...

AIJ | Artificial Intelligence | Journal | ScienceDirect.com by Elsevier

Aims & Scope ... The journal of Artificial Intelligence (AIJ) welcomes papers on broad aspects of AI that constitute advances in the overall field including, but ...