Operations
Power, cooling, monitoring, and keeping things running
Running local LLMs isn't just about having the right hardware — it's about keeping that hardware happy. High-end GPUs are power-hungry and heat-generating. This section covers the practical side of operations.
Power Requirements
GPU Power Draw
| GPU | TDP | Typical Load | Power Connectors |
|---|---|---|---|
| RTX 4090 | 450W | 300-400W | 1× 16-pin or 3× 8-pin |
| RTX 4080 | 320W | 250-300W | 1× 16-pin or 2× 8-pin |
| RTX 3090 | 350W | 300-350W | 2× 8-pin |
| RTX 3080 | 320W | 280-320W | 2× 8-pin |
| A100 80GB | 400W | 350-400W | Datacenter power |
PSU Sizing
Rule of Thumb: GPU TDP × 2 + 200W for system
Single RTX 4090: 450 × 2 + 200 = 1100W → 1000W+ PSU
Dual RTX 3090: 700 × 2 + 200 = 1600W → 1500W+ PSU
Transient Power Spikes
Modern GPUs can briefly spike well above TDP. A 450W GPU might spike to 600W+ momentarily. This is why the 2× rule exists — undersized PSUs can trigger shutdowns or damage.
Electrical Considerations
- Standard US outlet (15A/120V): ~1,800W max, realistically ~1,400W sustained
- Single high-end GPU: Usually fine on standard circuit
- Dual high-end GPUs: May need dedicated 20A circuit
- 4+ GPUs: Likely needs 240V or multiple circuits
Cooling
Temperature Targets
| Component | Ideal | Acceptable | Thermal Throttle |
|---|---|---|---|
| GPU Core | <70°C | <80°C | ~83°C |
| VRAM (GDDR6X) | <90°C | <100°C | ~110°C |
| CPU | <70°C | <85°C | ~100°C |
Cooling Strategies
Air Cooling
- Simpler, cheaper
- Good case airflow essential
- Intake at front/bottom, exhaust at rear/top
- Keep GPU slots spaced if possible
Water Cooling
- Better thermals, quieter
- Higher cost and complexity
- Maintenance required
- Consider for multi-GPU builds
Multi-GPU Cooling Challenges
Two or more GPUs in adjacent slots create cooling challenges:
- Top card exhausts hot air onto bottom card
- Restricted airflow between cards
- Consider blower-style cards (exhaust out back)
- Water cooling significantly helps multi-GPU
Monitoring
NVIDIA GPUs
# Basic monitoring
nvidia-smi
# Continuous monitoring (1 second refresh)
watch -n 1 nvidia-smi
# Detailed metrics
nvidia-smi dmon
# Query specific metrics
nvidia-smi --query-gpu=temperature.gpu,utilization.gpu,memory.used --format=csv -l 1
Key Metrics to Watch
| Metric | What It Tells You | Warning Signs |
|---|---|---|
| GPU Utilization | How busy the GPU is | Low during inference = bottleneck elsewhere |
| Memory Used | VRAM consumption | Near 100% = risk of OOM |
| Temperature | Thermal state | >80°C sustained = cooling issue |
| Power Draw | Current consumption | At/near TDP constantly = check thermals |
| Clock Speed | Current frequency | Dropping = thermal/power throttling |
Monitoring Tools
- nvidia-smi: Built-in, command line
- nvtop: htop-style GPU monitor
- GPU-Z: Windows GUI monitoring
- HWiNFO: Detailed Windows monitoring
Noise
High-end GPUs under load are LOUD:
| Scenario | Typical Noise |
|---|---|
| Single GPU, idle | Near silent (~30 dB) |
| Single GPU, load | Noticeable (~40-45 dB) |
| Dual GPU, load | Loud (~50+ dB) |
| Blower-style datacenter GPU | Very loud (~60+ dB) |
Noise Reduction Strategies
- Undervolt: Reduce power/heat with minimal performance loss
- Custom fan curves: Trade thermals for noise
- Water cooling: Dramatically quieter
- Different location: Closet, basement, separate room
- Apple Silicon: Near-silent alternative
Running as a Service
systemd Service (Linux)
# /etc/systemd/system/ollama.service
[Unit]
Description=Ollama LLM Service
After=network.target
[Service]
Type=simple
User=ollama
ExecStart=/usr/local/bin/ollama serve
Restart=always
RestartSec=3
[Install]
WantedBy=multi-user.target
# Enable and start
sudo systemctl enable ollama
sudo systemctl start ollama
sudo systemctl status ollama
Docker
# Ollama with GPU
docker run -d --gpus all \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
--restart unless-stopped \
ollama/ollama
Cost of Operation
Electricity cost example (US average ~$0.12/kWh):
Single RTX 4090 at 350W average, 8 hours/day:
350W × 8h × 30 days = 84 kWh/month
84 kWh × $0.12 = ~$10/month
Dual RTX 3090 at 600W average, 8 hours/day:
600W × 8h × 30 days = 144 kWh/month
144 kWh × $0.12 = ~$17/month
Total Cost of Ownership
Don't forget: electricity, cooling (AC in summer), potential electrical upgrades, replacement parts (fans wear out). A cheap used GPU might cost more in electricity than a newer efficient one.