Models & Pricing | DeepSeek API Docs

Understanding DeepSeek API Pricing: A Comprehensive Guide

DeepSeek is a platform offering powerful AI models, including deepseek-chat (now upgraded to DeepSeek-V3) and deepseek-reasoner (now DeepSeek-R1). Understanding the associated pricing is crucial for effective utilization and cost management. This article breaks down DeepSeek API's pricing structure, token usage, and important deduction rules.

Token-Based Pricing: What You Need to Know

DeepSeek API employs a token-based pricing model, common among many large language models (LLMs). You are billed based on the number of tokens processed by the model, both for input and output.

  • What is a Token?: A token is the smallest unit of text recognized by the model. This could be a word, a number, a punctuation mark, or part of a word. It is essential to understand tokenization to estimate costs accurately. For deeper insight, refer to DeepSeek's documentation on Token & Token Usage.
  • Billing: You are billed for the total count of both input tokens (the data you send to the model) and output tokens (the model's response).

DeepSeek API Pricing Details

Here's a breakdown of the pricing for deepseek-chat and deepseek-reasoner models. Note that prices are listed per 1 million tokens.

USD Pricing:

Model Context Length Max CoT Tokens Max Output Tokens 1M Tokens Input Price (Cache Hit) 1M Tokens Input Price (Cache Miss) 1M Tokens Output Price
deepseek-chat 64K - 8K $0.07 $0.27 $1.10
deepseek-reasoner 64K 32K 8K $0.14 $0.55 $2.19

CNY Pricing:

Model Context Length Max CoT Tokens Max Output Tokens 1M Tokens Input Price (Cache Hit) 1M Tokens Input Price (Cache Miss) 1M Tokens Output Price
deepseek-chat 64K - 8K ¥0.5 ¥2 ¥8
deepseek-reasoner 64K 32K 8K ¥1 ¥4 ¥16

Key Considerations:

  • (1) Model Upgrades: The deepseek-chat model has been upgraded to DeepSeek-V3, and deepseek-reasoner now utilizes the DeepSeek-R1 model.
  • (2) Chain of Thought (CoT): deepseek-reasoner uses CoT, which involves reasoning steps before providing the final answer. See Reasoning Model documentation for more details.
  • (3) Output Length: If max_tokens is not specified, the default maximum output length is 4K tokens. Adjust max_tokens to allow for longer outputs when needed.
  • (4) Context Caching: DeepSeek offers context caching, which can significantly reduce input costs when there are repeated elements in input prompts. Please refer to DeepSeek Context Caching for the details of Context Caching.
  • (5) deepseek-reasoner Output: The output token count for deepseek-reasoner includes both the CoT tokens and the final answer tokens, priced the same.

DeepSeek API Deduction Rules

The cost calculation is straightforward:

  • Expense = Number of Tokens x Price per Token

The charges are directly deducted from your topped-up balance. If you have both topped-up and granted balances available, the granted balance is used first.

Important Note on Pricing Fluctuations

DeepSeek reserves the right to adjust product prices. It’s recommended to regularly check the Models & Pricing page for the most up-to-date pricing information and to top up your account based on your actual usage.

Optimizing Costs

  • Monitor Token Usage: Track your input and output token counts to identify areas for optimization.
  • Utilize Context Caching: Take advantage of context caching where applicable to reduce input costs, as discussed in Context Caching is Available.
  • Optimize Prompts: Craft efficient prompts to minimize the number of input tokens required.
  • Control Output Length: Use the max_tokens parameter to limit the length of the generated text and prevent unnecessary costs. Setting appropriate Temperature Parameter is critical

By understanding DeepSeek API's pricing model and deduction rules, you can effectively manage your expenses while leveraging the power of its AI models. Regular monitoring and optimization are key to maximizing your return on investment.

. . .