The DeepSeek API offers powerful tools for natural language processing, including the intriguing FIM (Fill-In-the-Middle) Completion API. This article dives deep into understanding and utilizing the FIM Completion API, which is currently in Beta. We'll explore its functionality, parameters, and how to make effective use of it for code completion, text generation, and more.
FIM, or Fill-In-the-Middle, completion is a specific type of language model task. Instead of generating text from the beginning or continuing a sequence from the end, FIM allows the model to intelligently complete text within an existing context. Provide a prefix and suffix, and the model will generate the missing piece. This is exceptionally useful for tasks like code completion, where a user might start writing a function and the model can fill in the body.
Before you begin, it's important to note that the FIM Completion API with DeepSeek requires setting a specific base URL:
base_url="https://api.deepseek.com/beta"
This ensures you're accessing the beta endpoint where FIM functionality resides.
The FIM Completion API is accessed via a POST
request to the /completions
endpoint. Let's break down the request body parameters:
model
(string): Specifies the DeepSeek model to use. Currently, the possible value is deepseek-chat
.prompt
(string): The initial text, including the parts before and after where the completion should occur. For example: Once upon a time,
.echo
(boolean, nullable): If true
, the original prompt is included at the start of the generated response. This can be helpful for debugging and understanding the output.
frequency_penalty
(number, nullable): A value between -2.0 and 2.0 that influences the model's tendency to repeat tokens. Positive values decrease repetition, and negative values increase it. The default value is 0
.
logprobs
(integer, nullable): Includes the log probabilities of the most likely output tokens. If set to 20
, the API returns a list of the 20 most probable tokens.
logprobs
is 20.max_tokens
(integer, nullable): The maximum number of tokens the model can generate in its completion.
presence_penalty
(number, nullable): Similar to frequency_penalty
, but penalizes tokens based on their presence in the text so far, encouraging the model to explore new topics. The default value is 0
.
stop
(object, nullable): Defines up to 16 sequences where the API should stop generating further tokens. The returned text will not include the stop sequence. This can be a single string or an array of strings.
stream
(boolean, nullable): Enables streaming of the response. If true
, tokens are sent as server-sent events. The stream is terminated with a data: [DONE]
message. Check out Example Python code for implementation.
stream_options
(object, nullable): Options specific to streaming responses. This should only be set if stream
is set to true
.suffix
(string, nullable): The text that immediately follows the completion, providing more context for better fill-in generation.
temperature
(number, nullable): Controls the randomness of the output. Values closer to 0
make the output more focused and deterministic, while values near 2
introduce more randomness. A common starting point is 0.8
.
temperature
or top_p
, but generally not both.top_p
(number, nullable): Also known as "nucleus sampling," this parameter considers only the tokens comprising the top p
probability mass. For instance, 0.1
means only the tokens within the top 10% probability mass are evaluated.
temperature
or top_p
, but generally not both.include_usage
(boolean): If set, an additional chunk will be streamed before the final '[DONE]' message. This chunk contains token usage statistics for the entire request, with an empty choices array. All other chunks will include a usage field with a null value.
temperature
or top_p
to adjust the balance between predictable and creative outputs.frequency_penalty
and presence_penalty
to avoid the model repeating itself.stop
parameter for a defined ending of the generated content.A successful FIM Completion request (HTTP status code 200 OK
) returns a JSON response with the following structure:
id
(string): A unique identifier for the completion.choices
(object[]): An array containing the possible completions generated by the model. Each choice includes:
finish_reason
(string): Indicates why the model stopped generating tokens (e.g., "stop"
, "length"
, "content_filter"
, or "insufficient_system_resource"
).index
(integer): The index of the choice in the array.logprobs
(object, nullable): Contains log probability information about generated tokens. This contains keys like text_offset
, token_logprobs
, tokens
, and top_logprobs
.text
(string): The generated completion text.created
(integer): The Unix timestamp of when the completion was created.model
(string): The model used for the completion.system_fingerprint
(string): Represents the backend configuration used by the model.object
(string): Always "text_completion"
.usage
(object): Provides statistics on token usage:
completion_tokens
(integer): Tokens used in the generated completion.
prompt_tokens
(integer): Tokens used in the initial prompt. This is the sum of prompt_cache_hit_tokens
and prompt_cache_miss_tokens
.
prompt_cache_hit_tokens
(integer): Number of tokens in the prompt that hits the context cache.
prompt_cache_miss_tokens
(integer): Number of tokens in the prompt that misses the context cache.
total_tokens
(integer): Total tokens used (prompt + completion).
completion_tokens_details
(object): Additional breakdown of completion tokens.
reasoning_tokens
(integer): Tokens generated by the model for reasoning purpose.The DeepSeek API's FIM Completion functionality opens doors to various applications:
The DeepSeek API's FIM Completion (Beta) offers a powerful tool for completing text segments surrounded by context. By understanding the API parameters and analyzing the responses, developers can create innovative applications in code completion, text generation, and more. Remember to use the beta base URL and carefully tune the parameters to achieve desired outcomes. As the FIM Completion API matures, it promises to be a valuable asset for developers leveraging the power of large language models. Also, you might want to explore other DeepSeek API capabilities, as described in the API Reference.