Skip to content

Deployment Pipeline

Every push to main triggers an automated deployment to Caroline (the Pi 5). No manual SSH, no manual docker compose up. The pipeline handles change detection, CI gating, migration, rollback, and notification.

All GitHub Actions jobs run on Atlas (the M4 Pro development machine), except:

  • ci-gate — runs on ubuntu-latest to avoid a deadlock while polling the GitHub API
  • systemd-validate in lint.yml — runs on ubuntu-latest for systemd-analyze

Fork PRs are blocked from all self-hosted jobs.

Before any deploy job can proceed, ci-gate polls the GitHub API until both lint.yml and test.yml complete for the same commit. If either failed, the deploy is blocked. “Not triggered” counts as a pass (path-filtered workflows that didn’t fire don’t block deploy).

Lint covers: ruff (Python), yamllint, shellcheck, actionlint, systemd-validate.

Test covers: Vitest across all modules (dashboard API, dashboard web, auth middleware, worker OAuth, webhooks, therapy API, therapy web, n8n workflow structure, MCP proxy patches) plus pytest for Python utilities.

git push to main
|
+-- lint.yml (parallel, path-filtered)
+-- test.yml (parallel, path-filtered)
+-- secret-scan.yml (parallel, gitleaks)
+-- deploy.yml
|
+-- ci-gate (ubuntu-latest)
| Polls every 15s; blocks on lint/test failure
|
+-- detect-changes (self-hosted)
| SSH reads last-successful-deploy-sha from Caroline
| Diffs to HEAD; emits 17 service flags
|
+-- deploy (self-hosted) [needs: ci-gate + detect-changes]
| Supersession check (skip if newer deploy queued)
| Pause GitOps timer
| Pre-pull cleanup (reset runtime-mutated files, fix ownership)
| git fetch + reset --hard to exact commit SHA
| scripts/migrate.sh --mode deploy (always runs, idempotent)
| Conditional service restarts per changed paths
|
+-- verify (self-hosted) [needs: deploy]
| Container count, pg_isready, Caddy admin, Dashboard version
| n8n, Authelia, Cloudflared
| Write last-successful-deploy-sha stamp
|
+-- PASS: notify (Discord embed, resume GitOps)
+-- FAIL: rollback -> git reset --hard <pre_deploy_sha>
Rebuild affected services; post-rollback smoke test
Discord failure embed; GitOps remains paused

Pull mechanism: git fetch origin main && git reset --hard <sha> — not git pull. Guarantees the exact SHA deployed.

detect-changes reads the last successful deploy SHA from Caroline and diffs to HEAD. It emits 17 boolean service flags (dashboard_api, dashboard_web, migrations, compose, caddy, authelia, n8n_hooks, and more) plus file counts and diff stats.

Workflow-only changes (lint.yml, test.yml, Makefile) do not set any_deploy. Those pushes skip the deploy entirely.

Changed pathsAction
dashboard/api/ or dashboard/web/Build commit-cache.json, docker compose build --no-cache dashboard-api dashboard-nginx, up -d
mcp/Build all mcp-auth-* and mcp-proxy-* images, up -d
caddy/Validate Caddyfile, restart caddy
authelia/Recreate authelia (--no-deps)
cloudflared/Recreate cloudflared
grafana/up -d full grafana stack
postgres/initRecreate postgres, wait for pg_isready
n8n/hooks.jsRecreate n8n
discord-bot/Build and deploy discord-bot
docker-compose*.ymldocker compose up -d all core services

Migrations always run regardless of which paths changed. They are idempotent.

Rollback fires when verify fails after a successful deploy, or when deploy itself fails after the git pull. It does git reset --hard <pre_deploy_sha>, rebuilds affected services, runs a post-rollback smoke test, and writes a blocked-SHA sentinel to prevent GitOps from re-pulling the bad commit.

Independent of GitHub Actions, Caroline runs scripts/maintenance/gitops-converge.sh every 15 minutes via a systemd timer. It fetches origin/main, checks CI status, pulls with --ff-only, detects image drift for pulled-image services, and auto-recreates containers that are running stale images.

The deploy workflow pauses GitOps (via a sentinel file) before any git operations. The notify job removes it when the deploy succeeds. If GitOps pauses during an active deploy and the deploy fails, the sentinel stays until manually cleared or the 4-hour TTL expires.

The blocked-SHA sentinel prevents GitOps from pulling a rolled-back commit. It clears automatically when origin/main advances past the blocked SHA.

If multiple pushes land in rapid succession, a newer deploy may already be queued by the time an earlier one reaches the deploy step. The deploy job checks for a newer successful run via the GitHub API and skips itself if one exists. This prevents redundant back-to-back deploys from fighting over Caroline.

The notify job sends a Discord embed with one of four states:

StateTrigger
FAILUREDeploy or verify failed
FULLRecovery, migrations, or compose changes (green embed)
CONDENSEDRoutine code deploy (green embed)
SUPPRESSOnly n8n workflows, docs, or tests changed
Terminal window
make pi-status # Container status on Caroline
make pi-logs # Tail Docker logs on Caroline
make pi-restart # Restart all services on Caroline
make pi-db-migrate # Apply migrations to production (Pi)
make pi-shell # SSH bash shell on Caroline
FileTriggerPurpose
deploy.ymlpush to main, workflow_dispatchProduction deploy to Caroline
lint.ymlpush (py/sh/yaml), PR, workflow_dispatchruff, yamllint, shellcheck, actionlint, systemd
test.ymlpush (dashboard/mcp/etc), PR, workflow_dispatchVitest + pytest
audit.ymlSunday 9 AM UTC, PR, workflow_dispatchSecurity + quality audit, GitHub Pages report
secret-scan.ymlpush, PR, Monday 4 AM UTCGitleaks full-history credential scan
deploy-site.ymlpush to sites/ataraxis-dev/**, workflow_dispatchDeploy ataraxis.dev to Cloudflare Pages
deploy-ataraxis-software.ymlpush to sites/ataraxis-software/**, workflow_dispatchDeploy ataraxis.software to Cloudflare Pages