Running local LLMs isn't just about having the right hardware — it's about keeping that hardware happy. High-end GPUs are power-hungry and heat-generating. This section covers the practical side of operations.

Power Requirements

GPU Power Draw

GPU TDP Typical Load Power Connectors
RTX 4090 450W 300-400W 1× 16-pin or 3× 8-pin
RTX 4080 320W 250-300W 1× 16-pin or 2× 8-pin
RTX 3090 350W 300-350W 2× 8-pin
RTX 3080 320W 280-320W 2× 8-pin
A100 80GB 400W 350-400W Datacenter power

PSU Sizing

Rule of Thumb: GPU TDP × 2 + 200W for system Single RTX 4090: 450 × 2 + 200 = 1100W → 1000W+ PSU Dual RTX 3090: 700 × 2 + 200 = 1600W → 1500W+ PSU

Transient Power Spikes

Modern GPUs can briefly spike well above TDP. A 450W GPU might spike to 600W+ momentarily. This is why the 2× rule exists — undersized PSUs can trigger shutdowns or damage.

Electrical Considerations

Cooling

Temperature Targets

Component Ideal Acceptable Thermal Throttle
GPU Core <70°C <80°C ~83°C
VRAM (GDDR6X) <90°C <100°C ~110°C
CPU <70°C <85°C ~100°C

Cooling Strategies

Air Cooling

  • Simpler, cheaper
  • Good case airflow essential
  • Intake at front/bottom, exhaust at rear/top
  • Keep GPU slots spaced if possible

Water Cooling

  • Better thermals, quieter
  • Higher cost and complexity
  • Maintenance required
  • Consider for multi-GPU builds

Multi-GPU Cooling Challenges

Two or more GPUs in adjacent slots create cooling challenges:

Monitoring

NVIDIA GPUs

# Basic monitoring
nvidia-smi

# Continuous monitoring (1 second refresh)
watch -n 1 nvidia-smi

# Detailed metrics
nvidia-smi dmon

# Query specific metrics
nvidia-smi --query-gpu=temperature.gpu,utilization.gpu,memory.used --format=csv -l 1

Key Metrics to Watch

Metric What It Tells You Warning Signs
GPU Utilization How busy the GPU is Low during inference = bottleneck elsewhere
Memory Used VRAM consumption Near 100% = risk of OOM
Temperature Thermal state >80°C sustained = cooling issue
Power Draw Current consumption At/near TDP constantly = check thermals
Clock Speed Current frequency Dropping = thermal/power throttling

Monitoring Tools

Noise

High-end GPUs under load are LOUD:

Scenario Typical Noise
Single GPU, idle Near silent (~30 dB)
Single GPU, load Noticeable (~40-45 dB)
Dual GPU, load Loud (~50+ dB)
Blower-style datacenter GPU Very loud (~60+ dB)

Noise Reduction Strategies

Running as a Service

systemd Service (Linux)

# /etc/systemd/system/ollama.service
[Unit]
Description=Ollama LLM Service
After=network.target

[Service]
Type=simple
User=ollama
ExecStart=/usr/local/bin/ollama serve
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target
# Enable and start
sudo systemctl enable ollama
sudo systemctl start ollama
sudo systemctl status ollama

Docker

# Ollama with GPU
docker run -d --gpus all \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  --restart unless-stopped \
  ollama/ollama

Cost of Operation

Electricity cost example (US average ~$0.12/kWh): Single RTX 4090 at 350W average, 8 hours/day: 350W × 8h × 30 days = 84 kWh/month 84 kWh × $0.12 = ~$10/month Dual RTX 3090 at 600W average, 8 hours/day: 600W × 8h × 30 days = 144 kWh/month 144 kWh × $0.12 = ~$17/month

Total Cost of Ownership

Don't forget: electricity, cooling (AC in summer), potential electrical upgrades, replacement parts (fans wear out). A cheap used GPU might cost more in electricity than a newer efficient one.