DeepSeek offers a suite of powerful AI models accessible through their API. Understanding their pricing structure is crucial for effectively utilizing these tools without unexpected costs. This guide will break down DeepSeek's pricing model, helping you estimate and manage your expenses.
DeepSeek API pricing is primarily based on tokens. A token represents the smallest unit of text processed by the model, which could be a word, a number, or even punctuation. You're billed based on the total number of input tokens (what you send to the API) and output tokens (what the API returns).
DeepSeek offers different models tailored for specific tasks, each with its own pricing structure.
Model | Context Length | Max CoT Tokens | Max Output Tokens | Input Price (Cache Hit) / 1M Tokens | Input Price (Cache Miss) / 1M Tokens | Output Price / 1M Tokens |
---|---|---|---|---|---|---|
deepseek-chat | 64K | - | 8K | $0.07 | $0.27 | $1.10 |
deepseek-reasoner | 64K | 32K | 8K | $0.14 | $0.55 | $2.19 |
Important notes:
deepseek-chat
leverages the new DeepSeek-V3 model, indicating potential performance improvements.deepseek-reasoner
utilizes the DeepSeek-R1 model.deepseek-reasoner
uses to generate comprehensive answers. See the Reasoning Model documentation for more insight.max_tokens
is not specified in your API call, the default maximum output length is 4K tokens. Adjust this parameter to extend the possible response length.Context caching allows you to reuse previously processed information, reducing the number of tokens required for subsequent API calls. This translates to lower costs when engaging in multi-turn conversations or using repetitive prompts. Check out the DeepSeek Context Caching announcement for more details.
The table above displays two input prices: "Cache Hit" and "Cache Miss." When context caching is active and the model can reuse cached information:
When using the deepseek-reasoner
model:
DeepSeek employs clear rules for deducting API usage fees:
Expense Calculation: The cost is calculated by multiplying the number of tokens used by the corresponding price per token for both input and output.
Balance Deduction: Fees are directly deducted from your topped-up balance or any granted balance. If both exist, the granted balance is used first.
Price Adjustments: DeepSeek reserves the right to adjust product prices. Regularly monitor the Models & Pricing page for the latest information.
Let's say you use the deepseek-chat
model with context caching enabled (cache hit) and send an input of 1,500,000 tokens and receive an output of 800,000 tokens.
This simplified example highlights how to calculate the total cost based on token usage and the corresponding input/output prices.
To remain informed about DeepSeek's latest updates, pricing adjustments, and model releases, consider the following resources:
By understanding DeepSeek's token-based pricing, utilizing context caching effectively, and monitoring announcements, you can optimize your usage and manage your costs while leveraging powerful AI models.