Building Your Own Domain-Specific Knowledge Base with DeepSeek and AnythingLLM
Large Language Models (LLMs) like DeepSeek are powerful tools, but their general knowledge can be limited when it comes to specific domains. This article guides you through leveraging DeepSeek with AnythingLLM to create a custom knowledge base tailored to your specific industry or expertise. This allows you to build an AI assistant that truly understands and can assist with specialized tasks.
As suggested here, this approach can be transformative for roles needing quick access to specific information like process engineering, IT, or equipment maintenance.
Why Build a Custom Knowledge Base?
- Enhanced Accuracy: LLMs trained on general data might provide inaccurate or irrelevant answers when dealing with niche topics. A custom knowledge base ensures that the model is grounded in reliable, domain-specific information.
- Improved Efficiency: Quickly access and synthesize information from your proprietary documents, research papers, and industry reports.
- AI-Powered Assistance: Empower experts and other professionals to leverage an AI assistant that truly understands their field.
Step-by-Step Guide: DeepSeek and AnythingLLM
This guide focuses on utilizing DeepSeek and AnythingLLM to create your custom knowledge base.
1. Installing Ollama
Ollama is a tool that allows you to easily run open-source LLMs locally.
- Download: Download Ollama from the official website.
- Install: To specify an installation path, use command line
OllamaSetup.exe /DIR=d:\Ollama
.
- Configure Model Path: Set an environment variable
OLLAMA_MODELS=D:\Ollama\models
to specify where models will be stored, preventing them from consuming C drive space.
2. Downloading the DeepSeek Model
Next, pull the DeepSeek model for usage.
- Open Windows PowerShell: Using PowerShell is recommended to avoid potential parameter errors.
- Select a Model: Choose a DeepSeek model based on your system's resources. Larger models (e.g., 7B, 8B, 14B) generally provide more comprehensive answers but require more memory and processing power. For instance, deepseek-r1:8b requires around 4.9GB.
- Download: Use the command
ollama run deepseek-r1:8b
(replace "8b" with your chosen model size).
- Verify: Check the specified
D:\Ollama\models
directory to confirm that the model has been downloaded.
3. Installing AnythingLLM
AnythingLLM provides a user-friendly interface for interacting with your LLM and feeding it custom data.
- Download: Download AnythingLLM from the official website.
- Install: Follow the installation instructions using a custom path like D:\AnythingLLM. The installer downloads necessary models and files and may take a while.
- Potential Issue: Sometimes, the all-minilm-l6-v2 model download can fail. if document uploads fail, download the model manually from GitHub, and extract it to:
C:\Users\WXZZ\AppData\Roaming\anythingllm-desktop\storage\models
.
4. Basic Application
Now you can test the basic functionality:
- Launch: Open the AnythingLLM application.
- Set Language: Choose your preferred language (e.g. Chinese).
- Create Workspace: Create a new workspace ,then set "Workspace Chat Model" to
deepseek-r1:8b
in the 'Chat Settings' section.
- Test: Ask a simple question to the model to ensure it's working correctly, if the model is running correctly, you should get an answer back to your question.
5. Customizing with Your Knowledge Base
This is where you infuse DeepSeek with your specific domain knowledge:
- Upload Documents: In your workspace, upload relevant documents (PDFs, DOCs, TXTs, etc.).
- Move to Workspace: Select the documents and move them to your active workspace.
- Embed: Click "Save and Embed". This process uses the all-minilm-l6-v2 model to create vector embeddings of your documents, allowing DeepSeek to understand their content. The document is successfully embedded when the button behind the file becomes black.
- Query: Ask DeepSeek questions related to the content of the documents you uploaded.
Example: Planning a Rare Earth Production Control System
Without specific knowledge, DeepSeek can be weak in the field of rare earth. In AnythingLLM, you can upload documents like "Rare Earth Production Control System Integration Case.docx" and ask the same question again: "Plan the content of a rare earth production control system". DeepSeek will now provide a much more informed and relevant answer, drawing directly from the uploaded document.
Conclusion
By combining DeepSeek with AnythingLLM, you can create a powerful, domain-specific AI assistant. This approach allows you to leverage the power of LLMs while ensuring that the information is accurate, relevant, and tailored to your specific needs. This process of refining the AI through iterative data input can greatly enhance its value as an assistant in specialized roles.