Why a Raspberry Pi 5 in Production

Status: Decided. Pi 5 is production. Issue #190.

Context

The system needs a production host that is always on, physically close to the network, quiet, low-power, and affordable enough that the cost of it failing and being replaced does not threaten the project. The host runs 37 Docker containers serving a single user on a home network.

Three classes of host were considered: cloud VMs, a used server or NUC, and the Raspberry Pi 5.

Decision

Raspberry Pi 5, 16GB RAM variant, with a 1TB NVMe SSD. It lives in the network closet on the same UniFi switch as the router. No UPS, no RAID, no redundant power.

The 16GB variant upgraded from an earlier 8GB unit after container memory pressure became measurable. The 1TB NVMe upgraded from 256GB (issue #190, 0406-A) after the original drive reached 80% capacity.

What Was Rejected and Why

Cloud VMs (DigitalOcean, Hetzner, Linode): Cloud VMs add per-month cost that scales with storage and compute. More importantly, they add latency to every Home Assistant integration, every voice pipeline call, and every local network query. The system processes health data, messages, and location data that should not leave the local network. A cloud VM makes that architectural guarantee harder to enforce. Local deployment on private hardware is a stronger privacy guarantee than policy compliance with a cloud provider.

Used server or NUC: A used Intel NUC or Dell Optiplex provides more compute headroom and x86_64 compatibility. The tradeoffs: significantly higher idle power draw (15-65W vs 3-5W for the Pi), fan noise, and larger physical footprint. The network closet has limited space and no active cooling. A system that runs continuously for years optimizes for power efficiency over peak compute. The Pi 5 at 3-5W idle handles the full container workload.

Mac Mini or Mac Studio: Considered briefly. Too expensive for a dedicated production host, and macOS complicates Docker networking (Docker Desktop runs in a Linux VM with NAT, which breaks host-network services like Home Assistant’s mDNS).

Consequences

What works well:

3-5W idle power draw. Running year-round costs roughly $3-5/month in electricity.
Network closet placement means sub-millisecond LAN latency to all clients.
ARM64 compatibility is mature. Every service in the stack has an ARM64 image or builds cleanly from source on ARM64. The one exception (Piston, the code execution sandbox) is arm64-incompatible and disabled on Pi.
16GB RAM with per-container limits handles 37 containers without OOM events. Postgres gets 6GB, n8n gets 2560MB, and everything else is bounded.
1TB NVMe provides room for Prometheus metrics (2-year retention), Loki logs (permanent retention), and Tempo traces without aggressive pruning.

Tradeoffs accepted:

Less compute headroom than an x86 server. CPU-intensive tasks (Whisper transcription, container builds) are slower. Whisper small-int8 runs in real-time at 2.0 CPU; medium would be too slow. Container builds happen on the self-hosted runner on Atlas (the M4 Pro), not on Caroline.
No hardware redundancy. One Pi, one NVMe, no RAID. The backup strategy (hourly local dumps, daily NAS sync, encrypted R2 offsite) is the redundancy layer.
ARM64 compatibility must be verified before adding any new service. Most images have ARM64 variants now, but some do not.

Operational Notes

Caroline is accessed via SSH using a host alias configured in ~/.ssh/config. Public key auth only; password auth is disabled.

After a Pi reboot, scripts/power/post-boot-verify.sh runs via systemd to confirm all containers are healthy before the system is considered ready. A Grafana alert fires if Caroline becomes unreachable.

The NVMe drive upgraded from 256GB after reaching 80% capacity with 10 months of Prometheus and Loki data. Plan for storage growth: at current ingest rates, the 1TB NVMe needs pruning or expansion at approximately the 2-year mark.