In the rapidly evolving landscape of Artificial Intelligence, one company is making waves with its bold approach to large language models (LLMs). DeepSeek, a relatively new player founded in April 2023, has quickly gained recognition for its high-performance models offered at incredibly competitive prices, earning them the moniker "AI's Pinduoduo." This article delves into the background, strategies, and potential future of DeepSeek in the fiercely competitive AI market.
DeepSeek emerged from stealth mode with backing from private equity giant Fantasy Quant. Unlike many AI startups that rely on investments from established tech giants, DeepSeek operates independently, positioning itself as a potential disruptor of the existing market dynamics. On May, 2024, DeepSeekChat has been approved as generative AI service in Beijing
The company's most notable achievement is the DeepSeek-V2 model released in early May. This second-generation Mixture of Experts (MoE) model boasts performance comparable to GPT-4 Turbo but at a fraction of the cost – just one percent of GPT-4's price, making it accessible to a wider range of users and businesses looking to leverage AI capabilities.
DeepSeek's aggressive pricing strategy has been a major catalyst for the ongoing AI price war. Following DeepSeek's lead, major players like ByteDance (Doubao model), Alibaba (Tongyi Qianwen), and Baidu (Wenxin) have all announced significant price reductions or even free access to some of their models. Even international AI powerhouse Mistral AI released Mistral Large, offering input and output prices roughly 20% cheaper than the GPT-4 Turbo. OpenAI has responded with multiple price cuts.
While beneficial for consumers, this price war raises questions about sustainability and the ability of smaller players to compete with well-funded giants. As this article on AI trends explains in detail, the cost of developing, training, and deploying LLMs is substantial.
DeepSeek's ability to offer such low prices stems from its innovative architecture, where only a fraction of parameters is activated for each token processed. This contrasts with models that activate all parameters for every response, leading to higher computational costs. This is one of the primary keys to DeepSeek’s low pricing.
Jack Clark, co-founder of Anthropic and former policy lead at OpenAI, has praised DeepSeek's team and their understanding of the infrastructure required to train large, ambitious models.
Despite the industry's increasing focus on AI applications, DeepSeek remains committed to basic research. Liang Wenfeng, founder of Fantasy Quant, DeepSeek's parent company, has stated that they are prioritizing research on large models and the path towards Artificial General Intelligence (AGI) rather than rushing to develop specific applications. For more background on AGI, check out this in-depth explainer.
This long-term vision, while potentially risky in the short term, gives DeepSeek a unique position in the market. Securing access to computing power early was also proved to be a good play.
With major players joining the price war, DeepSeek's initial advantage may diminish. Key to DeepSeek’s survival will be continuing to invest in innovative AI model architectures to enable efficiency gains. Also, it remains to be seen whether DeepSeek's research-focused strategy will ultimately lead to long-term success or if the pressure to commercialize will force a shift in priorities.