Best VPS for Running Ollama and Self-Hosted LLMs in 2026

Published

Best VPS for Running Ollama and Self-Hosted LLMs in 2026

What You Need to Know First

Running a large language model locally on a VPS with Ollama is genuinely useful for: building private AI APIs, testing prompts without paying per-token, running automation with n8n or custom scripts, and keeping sensitive data off third-party infrastructure.

The tradeoff is clear — CPU inference is slower than a cloud API, and RAM requirements are steep. But for the right use case, a self-hosted LLM on a $8-15/month VPS is far cheaper than API costs at scale.

This guide covers the minimum specs, the best provider picks, and a working Ollama setup on Ubuntu.

RAM Requirements by Model Size

ModelRAM requiredSpeed on VPS CPUGood for
Phi-3 Mini (3.8B)4GBFast (~2-4s/resp)Simple tasks, low cost
Llama 3.2 3B4GBFastSummarization, classification
Mistral 7B8GBModerate (~5-10s/resp)General purpose, code
Llama 3.1 8B8GBModerateGeneral purpose
Llama 3 70B48GB+Very slow on CPUNot practical on standard VPS

The practical minimum for a useful self-hosted LLM is 8GB RAM, which runs Mistral 7B or Llama 3.1 8B at acceptable speed for non-interactive use cases.

Best VPS Providers for Ollama in 2026

1. Contabo CLOUD VPS S — Best value for RAM

SpecValue
vCPU4 shared
RAM8GB
Storage50GB NVMe
Price~$7.99/month

Contabo gives you the most RAM per dollar of any major provider. 8GB RAM at ~$8/month means you can comfortably run Mistral 7B or Llama 3.1 8B. CPU is shared and weaker than Hetzner, but for offline/async inference workloads this rarely matters.

Best for: Cost-conscious setups, dev environments, batch inference.

2. Hetzner CX32 — Best performance-to-price

SpecValue
vCPU4 shared AMD EPYC
RAM8GB
Storage80GB NVMe
Price~€8.30/month

Hetzner CX32 costs slightly more than Contabo’s 8GB plan but delivers noticeably faster CPU performance. AMD EPYC cores handle Ollama inference faster than Contabo’s shared hardware. If you plan to run a real API endpoint that needs reasonable response times, CX32 is the better pick.

Best for: API endpoints, production use, better CPU throughput.

3. Hetzner CX42 — For 13B+ models

SpecValue
vCPU8 shared AMD EPYC
RAM16GB
Storage160GB NVMe
Price~€17.30/month

If you want to run 13B models or keep memory pressure low while serving a real application alongside Ollama, CX42 is the right step up. 16GB RAM lets you run Llama 3.1 8B comfortably with room for your application stack.

Best for: 13B models, production APIs with concurrent requests.

4. DigitalOcean General Purpose — Best developer experience

SpecValue
vCPU2 dedicated
RAM8GB
Price~$63/month

DigitalOcean’s general-purpose Droplets use dedicated (not shared) vCPUs and have better sustained CPU performance — but the price jump is dramatic. This only makes sense if your team relies on DigitalOcean’s ecosystem (managed databases, team workflows) and Ollama is one component of a larger stack.

Best for: Teams already on DigitalOcean with strict performance SLAs.

Setting Up Ollama on Ubuntu (Hetzner CX32)

1. Provision a server

Create a Hetzner CX32 (or Contabo CLOUD VPS S) running Ubuntu 24.04 LTS. SSH in as root.

# Create a non-root user
adduser deploy
usermod -aG sudo deploy
# Switch to that user for the rest of the setup
su - deploy

2. Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

Ollama installs as a systemd service and starts automatically.

3. Pull a model

# Mistral 7B (~4.1GB download)
ollama pull mistral

# Llama 3.1 8B (~4.7GB download)
ollama pull llama3.1

# Phi-3 Mini for light use (~2.2GB)
ollama pull phi3:mini

4. Run a quick test

ollama run mistral "Explain what a VPS is in one sentence."

5. Expose as an API (with Nginx + auth)

By default Ollama binds to localhost:11434. To expose it securely:

sudo apt install nginx -y
sudo nano /etc/nginx/sites-available/ollama
server {
    listen 80;
    server_name your-domain.com;

    location / {
        proxy_pass http://127.0.0.1:11434;
        proxy_http_version 1.1;
        proxy_set_header Host $host;

        # Basic auth to prevent open access
        auth_basic "Ollama API";
        auth_basic_user_file /etc/nginx/.htpasswd;
    }
}
# Create auth credentials
sudo apt install apache2-utils -y
sudo htpasswd -c /etc/nginx/.htpasswd yourusername

# Enable site and get SSL
sudo ln -s /etc/nginx/sites-available/ollama /etc/nginx/sites-enabled/
sudo apt install certbot python3-certbot-nginx -y
sudo certbot --nginx -d your-domain.com
sudo nginx -t && sudo systemctl reload nginx

Your Ollama API is now accessible at https://your-domain.com/api/generate.

6. Keep it running with systemd

Ollama’s installer already sets up a systemd service. Verify it’s enabled:

sudo systemctl status ollama
sudo systemctl enable ollama

Integrating with n8n and Other Tools

If you are also running n8n on the same VPS, you can connect it to Ollama directly via http://localhost:11434 — no external network hop needed. This is one of the best cost arguments for self-hosting: one VPS runs both the workflow engine and the LLM backend.

In n8n, add an Ollama Chat Model node and set the base URL to http://localhost:11434.

CPU vs GPU VPS: When to Upgrade

Stay on CPU VPS when:

  • Response latency of 5-15 seconds is acceptable (async pipelines, batch jobs)
  • You run Ollama at off-hours or low concurrency
  • Budget is the primary constraint

Consider GPU VPS when:

  • Real-time chat interfaces need sub-2 second responses
  • You need to run 13B+ models at usable speed
  • Concurrent user load exceeds 2-3 simultaneous requests

GPU cloud instances (Lambda Labs, Vast.ai, RunPod) start at ~$0.40-0.80/hour for a T4 and are often more practical than paying for a dedicated GPU VPS.

Monthly Cost Comparison

SetupProviderMonthly cost7B model speed
Phi-3 Mini onlyContabo 4GB~$5Fast
Mistral 7B (minimum)Contabo 8GB~$8Moderate
Mistral 7B (better CPU)Hetzner CX32~€8.30Moderate+
Llama 3.1 8B + app stackHetzner CX42~€17Moderate
13B modelsHetzner CCX23 (dedicated)~€35+Slow-Moderate

What to Run Alongside Ollama

A single CX32 can comfortably run Ollama plus a lightweight application stack:

  • n8n (Docker) — workflow automation that calls Ollama for AI steps
  • Open WebUI — a ChatGPT-like interface for Ollama
  • Flowise — visual LLM app builder
  • A small Node.js or Python API that wraps Ollama responses

All four can coexist on 8GB RAM if you are not running multiple models simultaneously.

Self-Hosted LLM FAQ

Can I run Ollama on a shared VPS? Yes, but shared CPU VPS plans (Hetzner CX, Contabo Cloud VPS) will be slower than dedicated instances. For development and async use cases this is perfectly fine.

Does Ollama support GPU acceleration on VPS? Standard cloud VPS plans do not include GPU. You need a GPU-enabled instance. Hetzner offers GPU cloud servers (GX2-8 and above), but they are significantly more expensive.

Will running Ollama 24/7 increase costs? On a flat-rate VPS, no — you pay the same monthly fee regardless of CPU usage. There are no per-request charges like with API providers.