The large language model (LLM) arena is heating up once again, this time fueled by DeepSeek's aggressive price cuts. Dubbed a "price butcher," DeepSeek has recently announced a significant reduction in its API costs, potentially reshaping the landscape of AI accessibility and adoption. But will other major players follow suit?
DeepSeek has slashed its API input costs to a mere 0.1 yuan per million tokens and output costs to 2 yuan per million tokens. According to a report by Sina Finance, this price reduction represents a decrease of an order of magnitude, signaling a new era of affordability in the LLM market.
DeepSeek attributes its ability to drastically lower prices to its implementation of context hard drive caching technology—a technology which stores frequently used or repetitive information in a distributed hard drive array so it does not need to be re-calculated. According to DeepSeek, a significant portion of user input in LLM API usage involves repetitions. These repetitions range from repetitive prompts to including previous conversation turns. This new solution reduces service latency and significantly cuts usage costs because, according to DeepSeek, the repeated parts of the new input will simply be read from the cache.
DeepSeek stands out as the first global LLM provider to widely adopt hard drive caching in its API services. This is largely thanks to the MLA structure introduced in DeepSeekV2. The MLA structure improves model performance while drastically reducing the size of the context KVCache. The structure also significantly decreases the bandwidth and storage capacity, so caching to affordable hard drives becomes an option. Designed to handle 1 trillion requests per day, DeepSeekAPI also offers unlimited rate and unlimited concurrency to users.
This isn't DeepSeek's first foray into price reductions. Since May, DeepSeek has been a leading force in initiating API price wars.
These moves triggered a wave of price cuts from industry giants like Zhipu AI, Volcano Engine, Baidu, Tencent, and Alibaba Cloud. Notably, Alibaba Cloud reduced the price of its core model, Qwen-Long, by a staggering 97% to a mere 0.0005 yuan per thousand tokens. Baidu and Tencent followed suit by offering some of their LLM models for free. Even OpenAI, on the international stage, announced free usage of GPT-4o and halved its API call prices.
Volcano Engine further intensified the price war with its Doubao model, priced at an astonishing 0.0008 yuan per thousand tokens. This was just announced on May 15th, 2024 at a Volcano Engine event by the company's president, Tan Dai. At the time, the general market price for a model of Doubao's specifications was 0.12 yuan per thousand tokens. Tan Dai stated that reducing costs is key in advancing to the "value creation stage" and added that the "price war" would accelerate business innovation at a lower cost. Thus, Volcano Engine decreased the price to be 99.3% cheaper than the market average, driving the big model business into the "cent era".
As one Volcano Engine insider told Jiemian News, "The real reason for Doubao's price cuts lies in the fact that the application of large models in the enterprise sector has not yet developed, and there are too few scenarios." He pointed out that although the industry is discussing using AI large models to reconstruct businesses, it is rare to feel the landing of large model capabilities in daily work and life. "Price reduction is essentially lowering the barrier to use."
The recent price cuts have largely focused on lightweight model versions, intended for short-term use by small and medium-sized enterprises and individual developers. Overall, the LLM market is still in the developmental stage.
This strategy aims to attract developers and partners, fostering a robust ecosystem and paving the way for innovative applications across various sectors. The goal is clear: wider adoption is crucial for the industry's overall development.
However, the sustainability of relying solely on API sales for LLM commercialization remains a concern. As one FA (financial advisor) noted, "No large model company lives by selling APIs."
Fu Sheng, Chairman and CEO of Cheetah Mobile, echoed this sentiment. He believes that significant price reductions indicate that LLM startups must explore new business models. He also claimed that the companies dropping prices the most aggressively are big cloud service companies who use LLMS as part of their cloud services business and can make up the business shortfalls elsewhere. Meanwhile, LLM start-ups lack such resources and must find alternative business models. The market is trending in that direction: [experts predict significant growth in AI-adjacent industries](Internal Link).
Unlike the previous round of price cuts, DeepSeek's latest move has been met with relative silence from its competitors. As of now, few LLM companies have announced similar reductions or issued public statements. It is possible, the current lack of response is due to a more cautious approach in evaluating the long-term implications of such aggressive pricing strategies.
Despite the lack of immediate reaction from competitors, DeepSeek's actions signal a crucial shift towards making advanced AI technology more accessible. This increased accessibility could potentially lead to a surge in vertical applications, fostering innovation and driving the widespread adoption of LLMs across diverse industries.