Operations - Local LLM Knowledge Base

Running local LLMs isn't just about having the right hardware — it's about keeping that hardware happy. High-end GPUs are power-hungry and heat-generating. This section covers the practical side of operations.

Power Requirements

GPU Power Draw

GPU	TDP	Typical Load	Power Connectors
RTX 4090	450W	300-400W	1× 16-pin or 3× 8-pin
RTX 4080	320W	250-300W	1× 16-pin or 2× 8-pin
RTX 3090	350W	300-350W	2× 8-pin
RTX 3080	320W	280-320W	2× 8-pin
A100 80GB	400W	350-400W	Datacenter power

PSU Sizing

Rule of Thumb: GPU TDP × 2 + 200W for system Single RTX 4090: 450 × 2 + 200 = 1100W → 1000W+ PSU Dual RTX 3090: 700 × 2 + 200 = 1600W → 1500W+ PSU

Transient Power Spikes

Modern GPUs can briefly spike well above TDP. A 450W GPU might spike to 600W+ momentarily. This is why the 2× rule exists — undersized PSUs can trigger shutdowns or damage.

Electrical Considerations

Standard US outlet (15A/120V): ~1,800W max, realistically ~1,400W sustained
Single high-end GPU: Usually fine on standard circuit
Dual high-end GPUs: May need dedicated 20A circuit
4+ GPUs: Likely needs 240V or multiple circuits

Cooling

Temperature Targets

Component	Ideal	Acceptable	Thermal Throttle
GPU Core	<70°C	<80°C	~83°C
VRAM (GDDR6X)	<90°C	<100°C	~110°C
CPU	<70°C	<85°C	~100°C

Cooling Strategies

Air Cooling

Simpler, cheaper
Good case airflow essential
Intake at front/bottom, exhaust at rear/top
Keep GPU slots spaced if possible

Water Cooling

Better thermals, quieter
Higher cost and complexity
Maintenance required
Consider for multi-GPU builds

Multi-GPU Cooling Challenges

Two or more GPUs in adjacent slots create cooling challenges:

Top card exhausts hot air onto bottom card
Restricted airflow between cards
Consider blower-style cards (exhaust out back)
Water cooling significantly helps multi-GPU

Monitoring

NVIDIA GPUs

# Basic monitoring
nvidia-smi

# Continuous monitoring (1 second refresh)
watch -n 1 nvidia-smi

# Detailed metrics
nvidia-smi dmon

# Query specific metrics
nvidia-smi --query-gpu=temperature.gpu,utilization.gpu,memory.used --format=csv -l 1

Key Metrics to Watch

Metric	What It Tells You	Warning Signs
GPU Utilization	How busy the GPU is	Low during inference = bottleneck elsewhere
Memory Used	VRAM consumption	Near 100% = risk of OOM
Temperature	Thermal state	>80°C sustained = cooling issue
Power Draw	Current consumption	At/near TDP constantly = check thermals
Clock Speed	Current frequency	Dropping = thermal/power throttling

Monitoring Tools

nvidia-smi: Built-in, command line
nvtop: htop-style GPU monitor
GPU-Z: Windows GUI monitoring
HWiNFO: Detailed Windows monitoring

Noise

High-end GPUs under load are LOUD:

Scenario	Typical Noise
Single GPU, idle	Near silent (~30 dB)
Single GPU, load	Noticeable (~40-45 dB)
Dual GPU, load	Loud (~50+ dB)
Blower-style datacenter GPU	Very loud (~60+ dB)

Noise Reduction Strategies

Undervolt: Reduce power/heat with minimal performance loss
Custom fan curves: Trade thermals for noise
Water cooling: Dramatically quieter
Different location: Closet, basement, separate room
Apple Silicon: Near-silent alternative

Running as a Service

systemd Service (Linux)

# /etc/systemd/system/ollama.service
[Unit]
Description=Ollama LLM Service
After=network.target

[Service]
Type=simple
User=ollama
ExecStart=/usr/local/bin/ollama serve
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

# Enable and start
sudo systemctl enable ollama
sudo systemctl start ollama
sudo systemctl status ollama

Docker

# Ollama with GPU
docker run -d --gpus all \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  --restart unless-stopped \
  ollama/ollama

Cost of Operation

Electricity cost example (US average ~$0.12/kWh): Single RTX 4090 at 350W average, 8 hours/day: 350W × 8h × 30 days = 84 kWh/month 84 kWh × $0.12 = ~$10/month Dual RTX 3090 at 600W average, 8 hours/day: 600W × 8h × 30 days = 144 kWh/month 144 kWh × $0.12 = ~$17/month

Total Cost of Ownership

Don't forget: electricity, cooling (AC in summer), potential electrical upgrades, replacement parts (fans wear out). A cheap used GPU might cost more in electricity than a newer efficient one.

Power Requirements

GPU Power Draw

PSU Sizing

Transient Power Spikes

Electrical Considerations

Cooling

Temperature Targets

Cooling Strategies

Air Cooling

Water Cooling

Multi-GPU Cooling Challenges

Monitoring

NVIDIA GPUs

Key Metrics to Watch

Monitoring Tools

Noise

Noise Reduction Strategies

Running as a Service

systemd Service (Linux)

Docker

Cost of Operation

Total Cost of Ownership

Related Topics