The AI landscape is evolving at an unprecedented pace, and at the forefront of this revolution is DeepSeek, a Chinese AI company making waves with its cutting-edge large language models (LLMs). What's particularly fascinating about DeepSeek is the team behind its success: a group of bright, young graduates, many from prestigious universities like Tsinghua and Peking University. This article delves into the inner workings of DeepSeek, exploring the individuals, innovations, and organizational structure that have propelled it to the forefront of AI research.
DeepSeek has garnered significant attention, especially after the release of DeepSeek-v3, an open-source model that rivals Llama 3 using significantly less computing power (1/11th the resources). This achievement, coupled with rumors of tech giants like Xiaomi (led by Lei Jun) seeking to recruit DeepSeek's researchers, has sparked widespread curiosity about the company and its talent.
Several key individuals have been instrumental in DeepSeek's advancements:
Gao Huazuo: A graduate from Peking University's Department of Physics, Gao contributed significantly to the MLA architecture.
Zeng Wangding: A graduate student from Beijing University of Posts and Telecommunications (BUPT), under the guidance of Professor Zhang Honggang, also played a key role in developing the MLA architecture.
Shao Zhihong: A PhD student from Tsinghua University's Interactive AI (CoAI) group is specialized in Natural Language Processing, Deep Learning, and robust AI systems. Shao contributed to DeepSeekMath, DeepSeek-Prover, DeepSeek-Coder-v2, and DeepSeek-R1.
Zhu Qihao: A 2024 PhD graduate from Peking University, a winner of ACM SIGSOFT Distinguished Paper Awards, led the development of DeepSeek-Coder-V1, building upon his doctoral research in deep code learning.
Peiyi Wang: A PHD student from Peking University, guided by Professor Sui Zhifang.
Dai Damai: A 2024 PhD graduate from Peking University, has won serveral awards such as EMNLP 2023 Best Long Paper Award and CCL 2021 Best Chinese Paper Award.
Wang Bingxuan: Graduated from Peking University. Joined DeepSeek after obtaining his master degree and made important contributions to the DeepSeek LLM v1 project.
Zhao Chenggang: A Tsinghua University graduate with experience at Nvidia, serves as a training/inference infrastructure engineer at DeepSeek.
These individuals, along with many others, exemplify the young and dynamic nature of DeepSeek's team. Their contributions span various domains, from algorithmic innovation to infrastructure optimization.
DeepSeek distinguishes itself by emphasizing the synergy between model algorithms and hardware engineering. Unlike many AI companies that primarily focus on algorithms or data, DeepSeek recognizes the importance of optimizing hardware to improve training efficiency and reduce costs.
The DeepSeek team published a paper titled "Fire-Flyer AI-HPC," which details their efforts to design software and hardware in a coordinated manner to decrease training expenses and tackle the drawbacks of conventional supercomputing architectures when it comes to AI training requirements. The team optimized the utilization of Nvidia A100 GPUs within the Fire-Flyer AI-HPC cluster to achieve cost and energy efficiency, surpassing Nvidia's official DGX-A100 servers, to deliver better results at a lower cost.
DeepSeek's organizational structure may be a key ingredient in it’s success. Inspired by OpenAI, DeepSeek prioritizes ability over experience and welcomes outstanding talents irrespective of their educational background. Inspired by OpenAI's emphasis on talent and prospective technologies, the company promotes a horizontally structured organizational culture that promotes innovation and effective resource allocation. This approach fosters a dynamic atmosphere where creative ideas can surface, and people are given the tools to start their initiatives. With it’s success, DeepSeek has earned the title of “The OpenAI of China”.
DeepSeek's rise in the AI world is a story of young talent, groundbreaking innovations, and a unique organizational approach. By empowering young graduates and emphasizing the collaboration between algorithms and hardware, DeepSeek has quickly become a force to be reckoned with!
(Disclaimer: This article is for informational purposes only and does not represent the views of all parties mentioned.)