DeepSeek, a Chinese artificial intelligence company, is making waves in the AI landscape with its development of open-source large language models (LLMs). Founded in July 2023 by Liang Wenfeng, also the co-founder of the hedge fund High-Flyer, DeepSeek is rapidly gaining recognition for its innovative approach to AI development and its ability to deliver high-performance models at a fraction of the cost compared to its competitors.
DeepSeek's journey began in 2016 with High-Flyer, a hedge fund co-founded by Liang Wenfeng. High-Flyer initially focused on stock trading using AI algorithms and deep learning models. By 2021, AI had become the sole driver of High-Flyer's trading strategies.
In April 2023, High-Flyer announced the creation of an artificial general intelligence (AGI) lab, separate from its financial operations. This lab was officially incorporated as DeepSeek in July 2023, backed by High-Flyer's investments.
Key Milestones:
Unlike many AI companies, DeepSeek prioritizes research over immediate commercialization. This strategy allows the company to navigate China's stringent AI regulations, particularly those concerning consumer-facing technologies and government control over information.
DeepSeek also adopts a unique hiring strategy, focusing on technical abilities and potential rather than extensive work experience. The company actively seeks out recent university graduates and developers with less established AI careers. This approach fosters a culture of innovation and allows DeepSeek to tap into fresh perspectives. Furthermore, DeepSeek recruits talent from non-computer science backgrounds, enriching its models with diverse knowledge areas such as poetry and complex academic subjects.
DeepSeek leverages its custom-built computing clusters, named Fire-Flyer and Fire-Flyer 2, to train its AI models. Fire-Flyer 2 features a co-designed software and hardware architecture optimized for asynchronous random reads, utilizes Nvidia GPUs with high-speed interconnects, and is divided into two zones. The software side includes components like:
Since its inception, DeepSeek has released a series of cutting-edge language models, each building upon the previous advancements:
DeepSeek R1 and R1-Zero: were initialized from DeepSeek-V3-Base, while DeepSeek-R1-Distill models were initialized from other pretrained open-weight models, including LLaMA and Qwen, then fine-tuned on synthetic data generated by R1. DeepSeek-R1-Zero was exclusively trained via GRPO RL using rule-based reward functions focused on accuracy and format. Readability improvements were addressed by SFT DeepSeek-V3-Base to enhance output readability, with subsequent GRPO RL processes including a language consistency reward.
DeepSeek's models are "open weight," meaning they offer less modification freedom compared to true open software.
DeepSeek has achieved notable success challenging big competitors in the AI domain. The DeepSeek-R1 model, while providing outputs that are comparable to those of other contemporary LLMs, such as OpenAI's GPT-4o and o1, claims to have training costs that were significantly lower than that of other LLMs. The company claims that it trained R1 for US$6 million compared to $100 million for OpenAI's GPT-4 in 2023, and approximately one tenth of the computing power used for Meta's comparable model, LLaMA 3.1.
DeepSeek models offer performance for a low price and became the catalyst for China's AI business model price competition. It became known as the "Pinduoduo of AI", while other Chinese tech companies such as ByteDance, Tencent, Baidu, and Alibaba cut the price of their AI models. Despite the low price point, it was profitable in comparison to its money-losing rivals.
The company's ability to train cutting-edge models at a lower cost has been attributed to:
However, some sources suggest that the reported training costs may not reflect the full picture.
As a Chinese company, DeepSeek operates within a complex regulatory environment. There have been reports that DeepSeek models adhere to local content restrictions, limiting responses on sensitive topics like the Tiananmen Square massacre and Taiwan's political status.
Additionally, uncensored versions of DeepSeek models have displayed biases towards Chinese government viewpoints on controversial issues, including Xi Jinping's human rights record and Taiwan's political status. However, some users have reported successfully removing this censorship by hosting the models on their own devices and servers.
Due to its potential censorship and government viewpoint bias, several countries and regions have considered in banning DeepSeek.
DeepSeek's emergence as a key player has the potential to further democratize AI and accelerate its adoption across various industries. By continuing to push the boundaries of AI research and development, DeepSeek is poised to shape the future of artificial intelligence on a global scale.