This article provides a step-by-step guide on how to deploy the DeepSeek-Coder-V2-Lite-Instruct WebDemo, enabling you to interact with this powerful coding LLM through a user-friendly interface. This guide is tailored for those familiar with Linux environments and aims to simplify the deployment process.
Before diving into the deployment, ensure you have the following environment set up:
It's assumed that you have already installed the necessary PyTorch (CUDA) environment. If not, please install it before proceeding.
To begin, let's optimize your pip
configuration and install essential Python packages. This will speed up the download process and ensure all necessary libraries are available.
# Change pypi source to accelerate library installation
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
# Upgrade pip
python -m pip install --upgrade pip
pip install modelscope==1.16.1
pip install langchain==0.2.3
pip install streamlit==1.37.0
pip install transformers==4.43.2
pip install accelerate==0.32.1
In this step, we accomplish a few things. First, we configure pip
to use a faster mirror for package downloads, which will significantly reduce installation times, especially for large packages. We then update pip
to its latest version. Finally, we install the necessary Python packages, including modelscope
, langchain
, streamlit
, transformers
, and accelerate
. These libraries provide the foundation for running the DeepSeek-Coder-V2-Lite-Instruct model and creating the web interface.
For users who prefer a pre-configured environment, an AutoDL platform image with DeepSeek-Coder-V2-Lite-Instruct is available. Use the following link to create an AutoDL instance:
https://www.codewithgpu.com/i/datawhalechina/self-llm/Deepseek-coder-v2
Next, we'll download the DeepSeek-Coder-V2-Lite-Instruct model using the modelscope
library. This involves using the snapshot_download function, specifying the model name, download path, and version.
Create a file named download.py
in the /root/autodl-tmp
directory.
Add the following code to download.py
:
import torch
from modelscope import snapshot_download, AutoModel, AutoTokenizer
import os
model_dir = snapshot_download('deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct', cache_dir='/root/autodl-tmp', revision='master')
python /root/autodl-tmp/download.py
The model size is approximately 40 GB, and the download process may take around 20 minutes. A successful download will be indicated by a confirmation message in the terminal.
Now that the model is downloaded, we'll set up the Streamlit chatbot interface. This involves creating a Python script that loads the model and defines the interaction logic. Check out internal resources to delve deeper into LLM deployment strategies.
Create a file named chatBot.py
in the /root/autodl-tmp
directory.
Add the following code to chatBot.py
:
# Import necessary libraries
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
import torch
import streamlit as st
# Sidebar title and link
with st.sidebar:
st.markdown("## Index-1.9B-chat LLM")
"[开源大模型食用指南 self-llm](https://github.com/datawhalechina/self-llm.git)"
# Slider for max length
max_length = st.slider("max_length", 0, 1024, 512, step=1)
# Main title and caption
st.title("💬 DeepSeek-Coder-V2-Lite-Instruct")
st.caption("🚀 A streamlit chatbot powered by Self-LLM")
# Model path
model_name_or_path = '/root/autodl-tmp/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct'
# Function to load model and tokenizer
@st.cache_resource
def get_model():
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True)
return tokenizer, model
# Load the model and tokenizer
tokenizer, model = get_model()
# Initialize messages in session state if not present
if "messages" not in st.session_state:
st.session_state["messages"] = [{"role": "assistant", "content": "有什么可以帮您的?"}]
# Display existing messages
for msg in st.session_state.messages:
st.chat_message(msg["role"]).write(msg["content"])
# Handle user input
if prompt := st.chat_input():
# Add user message to session state
st.session_state.messages.append({"role": "user", "content": prompt})
# Display user message
st.chat_message("user").write(prompt)
# Prepare input
input_ids = tokenizer.apply_chat_template(st.session_state.messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([input_ids], return_tensors="pt").to('cuda')
# Generate response
generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=512)
generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
# Add assistant message to session state
st.session_state.messages.append({"role": "assistant", "content": response})
# Display assistant message
st.chat_message("assistant").write(response)
This code sets up a Streamlit application with a chat interface. It loads the DeepSeek-Coder-V2-Lite-Instruct model and tokenizer, manages the chat history, and generates responses based on user input. The @st.cache_resource
decorator ensures that the model is loaded only once, improving performance.
With the code prepared, you can now launch the Streamlit web interface. This will make the DeepSeek-Coder-V2-Lite-Instruct model accessible through a web browser.
streamlit run /root/autodl-tmp/chatBot.py --server.address 127.0.0.1 --server.port 6006 --server.enableCORS false
You should now see the DeepSeek-Coder-V2-Lite-Instruct chatbot interface in your browser. You can start interacting with the model by typing in the chatbox. Refer to official Streamlit documentation for advanced customization.
This comprehensive guide has provided you with the steps of deploying the excellent coding LLM DeepSeek-Coder-V2-Lite-Instruct WebDemo for your specific use case.