How much RAM do I need to run Ollama on a VPS?

You need at least 8GB RAM to run a 7B model like Llama 3.1 or Mistral 7B at acceptable speed. 16GB RAM is recommended for comfort with context length. 4GB is only sufficient for very small models like Phi-3 Mini.

Can I run Ollama on a $5 VPS?

Not reliably. A 1-2GB RAM VPS will OOM-kill Ollama when loading a 7B model. The minimum practical setup is a VPS with 8GB RAM, which typically costs $7-15/month depending on provider.

Is a CPU VPS fast enough for Ollama, or do I need a GPU?

A CPU-only VPS is fine for development, API testing, and low-traffic use cases. Inference on a 7B model takes 3-10 seconds per response on a modern VPS CPU. For real-time chat or high-concurrency workloads, a GPU server is needed.

What is the cheapest VPS for self-hosting an LLM?

Contabo's CLOUD VPS S (8GB RAM, ~$7/month) is the cheapest option that can realistically run a 7B model. Hetzner CX32 (8GB RAM, ~€8.30/month) is faster and more reliable but slightly more expensive.

Best VPS for Running Ollama and Self-Hosted LLMs in 2026

What You Need to Know First

Running a large language model locally on a VPS with Ollama is genuinely useful for: building private AI APIs, testing prompts without paying per-token, running automation with n8n or custom scripts, and keeping sensitive data off third-party infrastructure.

The tradeoff is clear — CPU inference is slower than a cloud API, and RAM requirements are steep. But for the right use case, a self-hosted LLM on a $8-15/month VPS is far cheaper than API costs at scale.

This guide covers the minimum specs, the best provider picks, and a working Ollama setup on Ubuntu.

RAM Requirements by Model Size

Model	RAM required	Speed on VPS CPU	Good for
Phi-3 Mini (3.8B)	4GB	Fast (~2-4s/resp)	Simple tasks, low cost
Llama 3.2 3B	4GB	Fast	Summarization, classification
Mistral 7B	8GB	Moderate (~5-10s/resp)	General purpose, code
Llama 3.1 8B	8GB	Moderate	General purpose
Llama 3 70B	48GB+	Very slow on CPU	Not practical on standard VPS

The practical minimum for a useful self-hosted LLM is 8GB RAM, which runs Mistral 7B or Llama 3.1 8B at acceptable speed for non-interactive use cases.

Best VPS Providers for Ollama in 2026

1. Contabo CLOUD VPS S — Best value for RAM

Spec	Value
vCPU	4 shared
RAM	8GB
Storage	50GB NVMe
Price	~$7.99/month

Contabo gives you the most RAM per dollar of any major provider. 8GB RAM at ~$8/month means you can comfortably run Mistral 7B or Llama 3.1 8B. CPU is shared and weaker than Hetzner, but for offline/async inference workloads this rarely matters.

Best for: Cost-conscious setups, dev environments, batch inference.

2. Hetzner CX32 — Best performance-to-price

Spec	Value
vCPU	4 shared AMD EPYC
RAM	8GB
Storage	80GB NVMe
Price	~€8.30/month

Hetzner CX32 costs slightly more than Contabo’s 8GB plan but delivers noticeably faster CPU performance. AMD EPYC cores handle Ollama inference faster than Contabo’s shared hardware. If you plan to run a real API endpoint that needs reasonable response times, CX32 is the better pick. Full cost breakdown in the Hetzner CX32 pricing guide.

Best for: API endpoints, production use, better CPU throughput.

3. Hetzner CX42 — For 13B+ models

Spec	Value
vCPU	8 shared AMD EPYC
RAM	16GB
Storage	160GB NVMe
Price	~€17.30/month

If you want to run 13B models or keep memory pressure low while serving a real application alongside Ollama, CX42 is the right step up. 16GB RAM lets you run Llama 3.1 8B comfortably with room for your application stack.

Best for: 13B models, production APIs with concurrent requests.

4. DigitalOcean General Purpose — Best developer experience

Spec	Value
vCPU	2 dedicated
RAM	8GB
Price	~$63/month

DigitalOcean’s general-purpose Droplets use dedicated (not shared) vCPUs and have better sustained CPU performance — but the price jump is dramatic. This only makes sense if your team relies on DigitalOcean’s ecosystem (managed databases, team workflows) and Ollama is one component of a larger stack.

Best for: Teams already on DigitalOcean with strict performance SLAs.

Setting Up Ollama on Ubuntu (Hetzner CX32)

1. Provision a server

Create a Hetzner CX32 (or Contabo CLOUD VPS S) running Ubuntu 24.04 LTS. SSH in as root.

# Create a non-root user
adduser deploy
usermod -aG sudo deploy
# Switch to that user for the rest of the setup
su - deploy

2. Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

Ollama installs as a systemd service and starts automatically.

3. Pull a model

# Mistral 7B (~4.1GB download)
ollama pull mistral

# Llama 3.1 8B (~4.7GB download)
ollama pull llama3.1

# Phi-3 Mini for light use (~2.2GB)
ollama pull phi3:mini

4. Run a quick test

ollama run mistral "Explain what a VPS is in one sentence."

5. Expose as an API (with Nginx + auth)

By default Ollama binds to localhost:11434. To expose it securely:

sudo apt install nginx -y
sudo nano /etc/nginx/sites-available/ollama

server {
    listen 80;
    server_name your-domain.com;

    location / {
        proxy_pass http://127.0.0.1:11434;
        proxy_http_version 1.1;
        proxy_set_header Host $host;

        # Basic auth to prevent open access
        auth_basic "Ollama API";
        auth_basic_user_file /etc/nginx/.htpasswd;
    }
}

# Create auth credentials
sudo apt install apache2-utils -y
sudo htpasswd -c /etc/nginx/.htpasswd yourusername

# Enable site and get SSL
sudo ln -s /etc/nginx/sites-available/ollama /etc/nginx/sites-enabled/
sudo apt install certbot python3-certbot-nginx -y
sudo certbot --nginx -d your-domain.com
sudo nginx -t && sudo systemctl reload nginx

Your Ollama API is now accessible at https://your-domain.com/api/generate.

6. Keep it running with systemd

Ollama’s installer already sets up a systemd service. Verify it’s enabled:

sudo systemctl status ollama
sudo systemctl enable ollama

Integrating with n8n and Other Tools

If you are also running n8n on the same VPS, you can connect it to Ollama directly via http://localhost:11434 — no external network hop needed. This is one of the best cost arguments for self-hosting: one VPS runs both the workflow engine and the LLM backend.

In n8n, add an Ollama Chat Model node and set the base URL to http://localhost:11434.

CPU vs GPU VPS: When to Upgrade

Stay on CPU VPS when:

Response latency of 5-15 seconds is acceptable (async pipelines, batch jobs)
You run Ollama at off-hours or low concurrency
Budget is the primary constraint

Consider GPU VPS when:

Real-time chat interfaces need sub-2 second responses
You need to run 13B+ models at usable speed
Concurrent user load exceeds 2-3 simultaneous requests

GPU cloud instances (Lambda Labs, Vast.ai, RunPod) start at ~$0.40-0.80/hour for a T4 and are often more practical than paying for a dedicated GPU VPS.

Monthly Cost Comparison

Setup	Provider	Monthly cost	7B model speed
Phi-3 Mini only	Contabo 4GB	~$5	Fast
Mistral 7B (minimum)	Contabo 8GB	~$8	Moderate
Mistral 7B (better CPU)	Hetzner CX32	~€8.30	Moderate+
Llama 3.1 8B + app stack	Hetzner CX42	~€17	Moderate
13B models	Hetzner CCX23 (dedicated)	~€35+	Slow-Moderate

What to Run Alongside Ollama

A single CX32 can comfortably run Ollama plus a lightweight application stack:

n8n (Docker) — workflow automation that calls Ollama for AI steps
Open WebUI — a ChatGPT-like interface for Ollama
Flowise — visual LLM app builder
A small Node.js or Python API that wraps Ollama responses

All four can coexist on 8GB RAM if you are not running multiple models simultaneously.

Self-Hosted LLM FAQ

Can I run Ollama on a shared VPS? Yes, but shared CPU VPS plans (Hetzner CX, Contabo Cloud VPS) will be slower than dedicated instances. For development and async use cases this is perfectly fine.

Does Ollama support GPU acceleration on VPS? Standard cloud VPS plans do not include GPU. You need a GPU-enabled instance. Hetzner offers GPU cloud servers (GX2-8 and above), but they are significantly more expensive.

Will running Ollama 24/7 increase costs? On a flat-rate VPS, no — you pay the same monthly fee regardless of CPU usage. There are no per-request charges like with API providers.

Ready to install? Follow the complete Ollama VPS setup guide — install, pull a model, systemd service, and Nginx API proxy with key authentication.