The landscape of artificial intelligence is constantly evolving, with new innovations emerging from across the globe. Recently, the AI community has been buzzing about DeepSeek R1, a new open-source reasoning model developed by the Chinese AI startup DeepSeek. This model has demonstrated performance that rivals or surpasses OpenAI's ChatGPT o1 on several key benchmarks, all while operating at a fraction of the cost. What makes DeepSeek's achievement even more remarkable is that they accomplished this despite facing increasing US export controls on cutting-edge chips.
Instead of hindering China's AI development, the US sanctions appear to be driving startups like DeepSeek to innovate in ways that prioritize efficiency, resource-pooling, and collaboration.
To create R1, DeepSeek reworked its training process to reduce the strain on its GPUs. According to Zihan Wang, a former DeepSeek employee and current PhD student in computer science at Northwestern University, they used a variety released by Nvidia for the Chinese market that have their performance capped at half the speed of its top products.
DeepSeek R1 has been lauded by researchers for its ability to handle complex reasoning tasks, especially in mathematics and coding. Like ChatGPT o1, the model uses a "chain of thought" approach, breaking down problems step by step to arrive at solutions. Dimitris Papailiopoulos, principal researcher at Microsoft’s AI Frontiers research lab, noted that DeepSeek focused on accurate answers rather than detailing every logical step, which significantly reduced computing time while maintaining a high level of effectiveness.
Here are some key features of DeepSeek R1:
DeepSeek has also released six smaller versions of R1 that can run locally on laptops. The company claims that one of them even outperforms OpenAI’s o1-mini on certain benchmarks. Aravind Srinivas, CEO of Perplexity, tweeted that DeepSeek has largely replicated o1-mini and has open-sourced it.
Based in Hangzhou, China, DeepSeek was founded in July 2023 by Liang Wenfeng, an alumnus of Zhejiang University with a background in information and electronic engineering. The company was incubated by High-Flyer, a hedge fund that Liang founded in 2015. Like Sam Altman of OpenAI, Liang aims to build artificial general intelligence (AGI), a form of AI that can match or even beat humans on a range of tasks.
Liang's decision to venture into AI was directly related to US export controls on advanced semiconductors. Before the anticipated sanctions, Liang acquired a substantial stockpile of Nvidia A100 chips, a type now banned from export to China. The Chinese media outlet 36Kr estimates that the company has over 10,000 units in stock, but Dylan Patel, founder of the AI research consultancy SemiAnalysis, estimates that it has at least 50,000. Liang recognized the potential of this stockpile for AI training, which led him to establish DeepSeek.
The Chinese AI space is dominated by tech giants like Alibaba and ByteDance, as well as startups with deep-pocketed investors. This makes it challenging for small or medium-sized enterprises to compete. DeepSeek, which has no plans to actively raise funds, is a rare exception.
Liang Wenfeng noted in an interview with the Chinese media outlet 36Kr in July 2024 that Chinese companies face an additional challenge: their AI engineering techniques tend to be less efficient. He stated that Chinese companies consume twice the computing power to achieve the same results. "Our goal is to continuously close these gaps," he said.
DeepSeek found ways to reduce memory usage and speed up calculation without significantly sacrificing accuracy. According to Zihan Wang, the team embraced hardware challenges as opportunities for innovation. Liang himself remains deeply involved in DeepSeek’s research process, running experiments alongside his team.
Chinese companies are increasingly embracing open-source principles. Alibaba Cloud has released over 100 new open-source AI models, supporting 29 languages and catering to various applications, including coding and mathematics. Startups like Minimax and 01.AI have also open-sourced their models.
According to a white paper released last year by the China Academy of Information and Communications Technology, the number of AI large language models worldwide has reached 1,328, with 36% originating in China. This makes China the second-largest contributor to AI, behind the United States. Thomas Qitong Cao, an assistant professor of technology policy at Tufts University, said that young Chinese researchers identify strongly with open-source culture because they benefit so much from it.
Matt Sheehan, an AI researcher at the Carnegie Endowment for International Peace, suggests that US export controls have forced Chinese companies to be far more efficient with their limited computing resources. The rapid evolution of AI demands agility from Chinese firms to survive. Recently, Alibaba Cloud partnered with the Beijing-based startup 01.AI to merge research teams and establish an "industrial large model laboratory."
DeepSeek's journey exemplifies how restrictions can spur innovation and efficiency. By focusing on resource optimization and embracing open-source collaboration, Chinese AI companies are making significant strides in the field. As the AI landscape continues to evolve, it will be interesting to see how these trends shape the future of AI development and deployment worldwide.
This article explores the impressive achievements of DeepSeek. You might also be interested in learning about other related topics, such as the impact of AI on internet search or the future of AI in 2025.