Large Language Models (LLMs) are revolutionizing various applications, from chatbots to content creation. But what if you could harness this power locally, without relying on cloud-based APIs? This article will guide you through setting up Ollama, a framework for running LLMs locally, and integrating it with DeepSeek R1, a powerful open-source AI model, to create a robust Retrieval-Augmented Generation (RAG) system. This setup allows you to build AI-powered applications that are free, private, fast, and work offline.
Before diving into the setup process, let's clarify the core components of this system.
Ollama: This is the backbone, enabling you to run LLMs like DeepSeek R1 directly on your computer. Think of it as a local server for AI models.
LangChain: This powerful Python/JS framework acts as the bridge, connecting LLMs like DeepSeek R1 to external data sources, APIs, and memory.
RAG (Retrieval-Augmented Generation): This is the intelligence booster. RAG enhances the LLM's responses by retrieving external data (like PDFs or databases) and incorporating it into the generation process.
DeepSeek R1: This is the brainpower. DeepSeek R1 is an open-source AI model specifically designed for reasoning, problem-solving, and factual retrieval.
Running DeepSeek R1 locally offers several advantages compared to relying on cloud-based models:
Benefit | Cloud-Based Models | Local DeepSeek R1 |
---|---|---|
Privacy | Data sent to external servers | 100% Local & Secure |
Speed | API latency & network delays | Instant inference |
Cost | Pay per API request | Free after setup |
Customization | Limited fine-tuning | Full model control |
Deployment | Cloud-dependent | Works offline & on-premises |
Let's walk through the process of setting up Ollama, running DeepSeek R1, and building a basic RAG system using Streamlit.
Pull the DeepSeek R1 Model: Open your terminal and run the following command:
ollama pull deepseek-r1:1.5b
This command downloads the DeepSeek R1 (1.5B parameter model) and prepares it for use.
Run DeepSeek R1: Once the model is downloaded, start interacting with it using:
ollama run deepseek-r1:1.5b
This command initializes the model and allows you to send queries directly from your terminal.
Now, let's integrate DeepSeek R1 into a RAG system using Streamlit, a Python framework for creating interactive web applications.
Prerequisites: Ensure you have the following installed:
Python
Conda (recommended for package management)
Required Python Packages:
pip install -U langchain langchain-community
pip install streamlit
pip install pdfplumber
pip install semantic-chunkers
pip install open-text-embeddings
pip install faiss-cpu
pip install ollama
pip install prompt-template
pip install langchain
pip install langchain_experimental
pip install sentence-transformers
If you need help setting up a Conda environment, refer to this guide: Setting Up a Conda Environment for Python Projects
Create a Project Directory:
mkdir rag-system && cd rag-system
Create a Python Script (app.py
): Paste the following code into app.py
:
import streamlit as st
from langchain_community.document_loaders import PDFPlumberLoader
from langchain_experimental.text_splitter import SemanticChunker
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.llms import Ollama
from langchain.prompts import PromptTemplate
from langchain.chains.llm import LLMChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains import RetrievalQA
# Streamlit UI
st.title("📄 RAG System with DeepSeek R1 & Ollama")
uploaded_file = st.file_uploader("Upload your PDF file here", type="pdf")
if uploaded_file:
with open("temp.pdf", "wb") as f:
f.write(uploaded_file.getvalue())
loader = PDFPlumberLoader("temp.pdf")
docs = loader.load()
text_splitter = SemanticChunker(HuggingFaceEmbeddings())
documents = text_splitter.split_documents(docs)
embedder = HuggingFaceEmbeddings()
vector = FAISS.from_documents(documents, embedder)
retriever = vector.as_retriever(search_type="similarity", search_kwargs={"k": 3})
llm = Ollama(model="deepseek-r1:1.5b")
prompt = """Use the following context to answer the question.
Context: {context}
Question: {question}
Answer:"""
QA_PROMPT = PromptTemplate.from_template(prompt)
llm_chain = LLMChain(llm=llm, prompt=QA_PROMPT)
combine_documents_chain = StuffDocumentsChain(llm_chain=llm_chain, document_variable_name="context")
qa = RetrievalQA(combine_documents_chain=combine_documents_chain, retriever=retriever)
user_input = st.text_input("Ask a question about your document:")
if user_input:
response = qa(user_input)["result"]
st.write("**Response:**")
st.write(response)
Start the Streamlit App: Open your terminal, navigate to your project directory (rag-system
), and run:
streamlit run app.py
This command launches the Streamlit application in your web browser.
Congratulations! You've successfully set up Ollama and DeepSeek R1 to build a local, AI-powered RAG system. This setup allows you to experiment with LLMs, process documents, and build intelligent applications without relying on external APIs or compromising your data privacy.
Feel free to explore the complete code on GitHub and continue learning to unlock the full potential of local LLMs! Consider following my Dev.to blog for more development tutorials.