The DeepSeek API offers powerful capabilities for developers, and one of its key features is context caching. This technology can significantly improve efficiency and reduce costs when working with the API. Let's delve into how DeepSeek's context caching works and how you can leverage it for your projects.
DeepSeek API utilizes a Context Caching on Disk Technology, enabled by default for all users. This means you automatically benefit from this optimization without needing to make any changes to your code. The core idea is that the API stores parts of your requests in a hard disk cache.
Here's how it works:
Key takeaway: Context caching focuses on reusing repeated prefixes between requests to minimize redundant processing.
To better grasp how context caching works, let's examine a few illustrative examples provided by DeepSeek:
Imagine you're using the DeepSeek API for financial report analysis:
First Request:
messages: [
{"role": "system", "content": "You are an experienced financial report analyst..."},
{"role": "user", "content": "<financial report content>\n\nPlease summarize the key information of this financial report."}
]
Second Request:
messages: [
{"role": "system", "content": "You are an experienced financial report analyst..."},
{"role": "user", "content": "<financial report content>\n\nPlease analyze the profitability of this financial report."}
]
In this scenario, both requests share the same prefix: the system message and the financial report content. The second request will experience a "cache hit" for this shared portion, saving computational resources.
Consider a multi-turn conversation with the API:
First Request:
messages: [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "What is the capital of China?"}
]
Second Request:
messages: [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "What is the capital of China?"},
{"role": "assistant", "content": "The capital of China is Beijing."},
{"role": "user", "content": "What is the capital of the United States?"}
]
The second request reuses the initial system message and the first user message. This repetition triggers a cache hit, making the second request more efficient. Also refer to the official documentation on Multi-round Conversation for further information.
Few-shot learning can be greatly enhanced by context caching. This involves providing a few examples to guide the model's response.
First Request:
messages: [
{"role": "system", "content": "You are a history expert..."},
{"role": "user", "content": "In what year did Qin Shi Huang unify the six states?"},
{"role": "assistant", "content": "Answer: 221 BC"},
// ... more examples ...
{"role": "user", "content": "Who was the founding emperor of the Qing Dynasty?"}
]
Second Request:
messages: [
{"role": "system", "content": "You are a history expert..."},
{"role": "user", "content": "In what year did Qin Shi Huang unify the six states?"},
{"role": "assistant", "content": "Answer: 221 BC"},
// ... more examples ...
{"role": "user", "content": "When did the Shang Dynasty fall?"}
]
Since the first several turns (the "shots") are identical, the second request benefits from a significant cache hit, reducing the computational cost of using few-shot learning.
DeepSeek API provides insights into the cache hit performance through the usage
section of the API response:
prompt_cache_hit_tokens
: Number of tokens retrieved from the cache. These are billed at a lower rate (0.1 yuan per million tokens).prompt_cache_miss_tokens
: Number of tokens that were not found in the cache and processed normally (billed at 1 yuan per million tokens).This data allows you to evaluate the effectiveness of context caching and optimize your prompts accordingly. Consider exploring the Token & Token Usage guide to fully understand the costing.
DeepSeek API's context caching is a valuable built-in feature that optimizes API usage by reusing overlapping prefixes in your requests. By understanding how it works and utilizing it effectively, you can significantly reduce costs and improve the performance of your applications. Make sure to consistently revise your implementation in accordance to the Change Log to keep up with the newest upgrades.