Running Large Language Models Locally with Ollama: A Comprehensive Guide

使用OLLAMA 跑deepseek llama qwen 等大模型，以及下载速度太慢 ...

Running Large Language Models Locally with Ollama: A Comprehensive Guide

Large Language Models (LLMs) like DeepSeek, Llama, and Qwen are revolutionizing AI, but running them can be challenging. This article guides you through using Ollama, a tool that simplifies running these models locally, even with slow download speeds. We'll cover installation, model selection, and configuration, ensuring you can harness the power of LLMs on your own machine.

Why Choose Ollama?

Ollama offers a user-friendly way to run LLMs. It encapsulates the complexities of llama.cpp, providing a seamless, one-click experience. It's ideal for:

Ease of Use: Simple setup and execution, perfect for those who want to avoid complicated configurations.
API Accessibility: Offers an API for connecting to web UIs and code editors like VSCode.
Local Processing: Keeps your data private and reduces reliance on external services.

Prerequisites

Before diving into Ollama, ensure you have the necessary drivers installed, such as CUDA drivers for NVIDIA GPUs. Installation guides can be found in our other articles or through online searches.

Installing Ollama

Here’s how to install Ollama on Linux. The process is similar for other platforms, but network issues can arise due to slow international download speeds. A proxy is highly recommended.

Set Up a Proxy (if needed):

If you're behind a firewall or have slow internet, set up a local HTTP proxy. You can often configure this through a proxy software. Then, set temporary environment variables in your terminal:
```
export HTTPS_PROXY=http://127.0.0.1:8123
export HTTP_PROXY=http://127.0.0.1:8123
```
Note: These settings are temporary and will be reset when you close the terminal.
Download and Install:

Follow the official instructions on the Ollama download page. The installation typically requires root privileges.

Selecting the Right Model

Choosing the right model is crucial. The Ollama model library offers a variety of options. Consider these factors:

Model Size: Larger models (like DeepSeek-R1) offer better performance but require more computational resources. Consider smaller, distilled versions like the Qwen2-based variant if your hardware is limited.
Base Model: Different models are built on different architectures. DeepSeek-R1-32B, for example, uses Qwen2 as its foundation.
Hugging Face: Hugging Face is another excellent resource for GGUF format models, but these may require creating a modelfile for Ollama to use them. If possible, prioritize models available directly through Ollama for simpler setup.

Downloading Models

Ollama provides several ways to download models:

Automatic Download from Ollama

The easiest method is to use the ollama run command. For example:

ollama run deepseek-r1:32b

This command automatically downloads and runs the deepseek-r1:32b model. You can also download the model separately using:

ollama pull deepseek-r1:32b

Troubleshooting Slow Downloads

If downloads are slow despite setting terminal proxies, Ollama might not be using them. Because Ollama runs as a systemd service, configure the proxy settings directly within the service file:

Edit the Systemd Service File:

sudo vim /etc/systemd/system/ollama.service

Add Proxy Settings:

In the [Service] section, add the following lines:

[Service]
Environment="HTTPS_PROXY=http://127.0.0.1:8123"
Environment="HTTP_PROXY=http://127.0.0.1:8123"

Reload and Restart Ollama:

sudo systemctl daemon-reload
sudo systemctl restart ollama.service

Verify Configuration:

sudo systemctl show ollama | grep Environment

Automatic Download from Hugging Face

Ollama can also run GGUF models directly from Hugging Face:

ollama run hf.co/unsloth/DeepSeek-R1-GGUF:Q4_K_M

Manual Loading of GGUF Models

While less convenient, you can manually load GGUF models. This involves creating a modelfile (similar to a Dockerfile) that specifies how Ollama should handle the model. Refer to the Ollama documentation for details.

Enabling Remote Access

By default, Ollama only accepts local connections. To allow access from other machines on your network:

Edit the Systemd Service File:

sudo vim /etc/systemd/system/ollama.service

Add Environment Variables:

In the [Service] section, add:
```
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_ORIGINS=*"
```
- OLLAMA_HOST=0.0.0.0 allows connections from any IP address.
- OLLAMA_ORIGINS=* allows requests from any origin.

Interacting with Ollama: Web UI

For a more interactive experience, use a web UI like Open WebUI.

Install Python and Open WebUI:
```
pip install open-webui
open-webui serve
```
You might need a specific Python version. If your system version is too old, use miniconda to create a virtual environment:
```
conda create -n py3.11 python=3.11
conda activate py3.11
```
Access the UI:

Open your browser and navigate to http://localhost:8080.

Troubleshooting Open WebUI

Slow Downloads: Open WebUI might download additional models from Hugging Face. Use the same proxy settings as before to speed up the process.
UI Issues: If the interface appears incomplete or gets stuck at the get_model step, it's likely a network issue. Ensure your proxy is active and configured correctly. You might need exclude localhost addresses (127.0.0.1, 192.168.*) from your proxy settings to avoid conflicts.
OpenAI Dependency: The UI may attempt to connect to OpenAI. Disable OpenAI or manually configure the Ollama IP address in the settings:

Integrating with VSCode

The Continue VSCode extension allows you to use Ollama models directly within your code editor. Add a custom Ollama model provider, specifying the address where Ollama is running (e.g., http://localhost:11434). Consult the Continue documentation for detailed instructions.

Conclusion

Ollama simplifies running powerful LLMs like DeepSeek, Llama, and Qwen locally. By following this guide, you can overcome common hurdles like slow download speeds and configuration issues. With Ollama, you can unlock the potential of these models for various applications, from local experimentation to integration with development tools like VSCode.

. . .

Bibliography Generators | Services | HBLL

"Zotero [zoh-TAIR-oh] is a free, easy-to-use tool to help you collect, organize, cite, and share your research sources. It lives right where you do your work—in ...

Username Generator - Personalized Name Ideas - Available ...

How to generate random usernames? Try the SpinXO username generator to create a personal and secure username, gamer tags, nicknames, or social media handles.

chrome//flags - Google Chrome Community

Mar 6, 2022 ... The chrome//flags is blocked or disabled when typed into the browser. I don't get the options described simply a list of how to do this.

Google AI Essentials | Coursera

Offered by Google. Google AI Essentials is a self-paced course designed to help people across roles and industries get essential AI skills .

Length Conversion Calculator - Inch Calculator

Use our length conversion calculator to quickly convert length and distance measurements, including miles, yards, feet, inches, and meters.