Large Language Models (LLMs) like DeepSeek, Llama, and Qwen are revolutionizing AI, but running them can be challenging. This article guides you through using Ollama, a tool that simplifies running these models locally, even with slow download speeds. We'll cover installation, model selection, and configuration, ensuring you can harness the power of LLMs on your own machine.
Ollama offers a user-friendly way to run LLMs. It encapsulates the complexities of llama.cpp, providing a seamless, one-click experience. It's ideal for:
Before diving into Ollama, ensure you have the necessary drivers installed, such as CUDA drivers for NVIDIA GPUs. Installation guides can be found in our other articles or through online searches.
Here’s how to install Ollama on Linux. The process is similar for other platforms, but network issues can arise due to slow international download speeds. A proxy is highly recommended.
Set Up a Proxy (if needed):
If you're behind a firewall or have slow internet, set up a local HTTP proxy. You can often configure this through a proxy software. Then, set temporary environment variables in your terminal:
export HTTPS_PROXY=http://127.0.0.1:8123
export HTTP_PROXY=http://127.0.0.1:8123
Note: These settings are temporary and will be reset when you close the terminal.
Download and Install:
Follow the official instructions on the Ollama download page. The installation typically requires root privileges.
Choosing the right model is crucial. The Ollama model library offers a variety of options. Consider these factors:
modelfile
for Ollama to use them. If possible, prioritize models available directly through Ollama for simpler setup.Ollama provides several ways to download models:
The easiest method is to use the ollama run
command. For example:
ollama run deepseek-r1:32b
This command automatically downloads and runs the deepseek-r1:32b
model. You can also download the model separately using:
ollama pull deepseek-r1:32b
Troubleshooting Slow Downloads
If downloads are slow despite setting terminal proxies, Ollama might not be using them. Because Ollama runs as a systemd service, configure the proxy settings directly within the service file:
Edit the Systemd Service File:
sudo vim /etc/systemd/system/ollama.service
Add Proxy Settings:
In the [Service]
section, add the following lines:
[Service]
Environment="HTTPS_PROXY=http://127.0.0.1:8123"
Environment="HTTP_PROXY=http://127.0.0.1:8123"
Reload and Restart Ollama:
sudo systemctl daemon-reload
sudo systemctl restart ollama.service
Verify Configuration:
sudo systemctl show ollama | grep Environment
Ollama can also run GGUF models directly from Hugging Face:
ollama run hf.co/unsloth/DeepSeek-R1-GGUF:Q4_K_M
While less convenient, you can manually load GGUF models. This involves creating a modelfile
(similar to a Dockerfile) that specifies how Ollama should handle the model. Refer to the Ollama documentation for details.
By default, Ollama only accepts local connections. To allow access from other machines on your network:
Edit the Systemd Service File:
sudo vim /etc/systemd/system/ollama.service
Add Environment Variables:
In the [Service]
section, add:
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_ORIGINS=*"
OLLAMA_HOST=0.0.0.0
allows connections from any IP address.OLLAMA_ORIGINS=*
allows requests from any origin.For a more interactive experience, use a web UI like Open WebUI.
Install Python and Open WebUI:
pip install open-webui
open-webui serve
You might need a specific Python version. If your system version is too old, use miniconda
to create a virtual environment:
conda create -n py3.11 python=3.11
conda activate py3.11
Access the UI:
Open your browser and navigate to http://localhost:8080
.
Troubleshooting Open WebUI
Slow Downloads: Open WebUI might download additional models from Hugging Face. Use the same proxy settings as before to speed up the process.
UI Issues: If the interface appears incomplete or gets stuck at the get_model
step, it's likely a network issue. Ensure your proxy is active and configured correctly. You might need exclude localhost addresses (127.0.0.1, 192.168.*) from your proxy settings to avoid conflicts.
OpenAI Dependency: The UI may attempt to connect to OpenAI. Disable OpenAI or manually configure the Ollama IP address in the settings:
The Continue VSCode extension allows you to use Ollama models directly within your code editor. Add a custom Ollama model provider, specifying the address where Ollama is running (e.g., http://localhost:11434
). Consult the Continue documentation for detailed instructions.
Ollama simplifies running powerful LLMs like DeepSeek, Llama, and Qwen locally. By following this guide, you can overcome common hurdles like slow download speeds and configuration issues. With Ollama, you can unlock the potential of these models for various applications, from local experimentation to integration with development tools like VSCode.