MCP Layer

The Model Context Protocol (MCP) is the bridge between AI assistants (Claude Code, claude.ai, Claude mobile) and the infrastructure. Instead of pasting log output or querying services manually, the AI connects directly to live systems: it can search semantic memory, trigger n8n workflows, inspect containers, query the database, and control Home Assistant — all through a unified protocol.

chris-os runs 14 MCP servers exposing over 600 tools in total.

MCP architecture — 14 servers grouped by transport, auth proxy layer, stdio bridge, 3-tier memory cache

Server Inventory

Server	Transport	Purpose	Tools
`memory`	stdio (bridge)	Persistent semantic memory — store, search, relate, and analyze knowledge across sessions	28
`home-assistant`	stdio (uvx)	Home Assistant control — devices, automations, entities, scenes	89
`github`	stdio	Repo management, issues, PRs, GitHub Projects	~30
`n8n`	SSE	n8n workflow management — list, create, update, test workflows	~20
`unifi`	SSE	UniFi network control — device inventory, VLAN inspection, client management	~15
`docker`	stdio	Docker management on the production host — containers, images, networks	~15
`google-workspace`	stdio (uvx)	Google Docs, Sheets, Gmail, Calendar, Drive, Tasks	~60
`docs-mcp-server`	SSE (local)	Local documentation index — 159 libraries, full-text + semantic search	~10
`ollama`	stdio (npx)	Ollama model management on the inference host	~10
`discord`	stdio (npx)	Discord server management — messages, channels, roles	~50
`context7`	stdio (npx)	Live library documentation via Upstash	~5
`d2`	stdio	Diagram creation and export using D2 language	~8
`brewersfriend`	stdio	Brewing data — recipes, fermentation sessions	~5
`gcloud` / `google-workspace`	stdio	Google Cloud and Google Workspace operations	varies

Connection Topology

Servers connect via two transports: stdio (process spawned locally, communicates over stdin/stdout) or SSE (Server-Sent Events over HTTP, persistent stream for server-push).

LAN (Claude Code on the dev workstation)

Claude Code
  ├── memory          stdio  → bridge script → HTTP/SSE → production host (MCP auth)
  ├── n8n             SSE    → production host (MCP auth)
  ├── unifi           SSE    → production host (SSH tunnel, loopback only)
  ├── home-assistant  stdio  → uvx ha-mcp → HA API via long-lived token
  ├── github          stdio  → GitHub API via PAT
  ├── docker          stdio  → uvx mcp-server-docker (SSH DOCKER_HOST to prod host)
  ├── ollama          stdio  → npx ollama-mcp → inference host (Ollama API)
  ├── docs-mcp-server SSE    → localhost (local process only)
  └── d2 / discord / context7 / brewersfriend / gcloud / workspace  stdio → local or cloud APIs

For n8n and unifi, the SSE transport connects directly to the production host over LAN. Each connection passes through a Caddy listener that routes to the appropriate auth middleware container.

Remote (claude.ai via Cloudflare)

claude.ai
  → Cloudflare Worker (OAuth provider)
      GitHub OAuth → issues scoped Bearer tokens
      Routes to:
        /api/db      → Cloudflare Tunnel → mcp-auth-postgres
        /api/n8n     → Cloudflare Tunnel → mcp-auth-n8n
        /api/memory  → Cloudflare Tunnel → mcp-auth-memory
        /api/ha      → Cloudflare Tunnel → mcp-auth-ha

The OAuth Worker uses dedicated Cloudflare Tunnel hostnames to avoid a Cloudflare same-zone fetch loop — a Worker cannot fetch a hostname on the same zone directly.

Internal chain (per MCP service)

Every MCP service on the production host follows the same four-layer chain:

Internet / LAN
  → Caddy (net-frontend)
    → mcp-auth-{service} (net-mcp + net-frontend)  ← validates credentials
      → mcp-proxy-{service} (net-mcp + net-data/net-app)  ← protocol bridge
        → upstream service (postgres / n8n / mcp-ai-memory)

This separation means the upstream services (postgres, n8n) are never directly reachable from outside net-data or net-app. The auth containers are the only point of credential validation.

Authentication Model

Two credential types are accepted at each MCP auth middleware container:

API key (LAN / Claude Desktop / scripts / n8n):

Per-service keys for blast-radius isolation
Multiple keys per service supported for zero-downtime rotation
On LAN: the auth middleware rewrites SSE endpoint event URLs to carry credentials automatically on subsequent MCP SDK POST calls

Cloudflare Access JWT (claude.ai / remote):

Issued by the OAuth Worker after GitHub OAuth completes
Validated against the Cloudflare Access team and per-service audience claims
Only the GitHub account on the allowlist can obtain tokens

On LAN, the auth middleware terminates the MCP SDK’s OAuth discovery flow and forces immediate fallback to API key authentication.

Memory Server

The memory server is the most architecturally complex component. It provides persistent, searchable semantic memory across all AI sessions — storing decisions, insights, project state, and conversation context.

Container chain

Claude Code / claude.ai
  → stdio bridge (LAN) or OAuth Worker + Tunnel (remote)
    → mcp-auth-memory
      → mcp-proxy-memory
        → mcp-ai-memory (upstream MCP server, stdio child of proxy)
            ├── PostgreSQL (memory schema, app role)
            ├── Redis (BullMQ job queue + search cache)
            └── Ollama on inference host (embeddings)

mcp-ai-memory is a heavily patched fork of an upstream MCP memory server — 48 patches applied against the original. The proxy wraps it as a stdio child process and exposes an HTTP/SSE interface to the auth middleware.

Three-tier cache

Tier	Technology	Contents	Invalidation
L1	JS Map (in-process)	Search result cache, keyed by query hash	On every successful `memory_store` call
L2	Redis	BullMQ job queue (embedding + clustering), search result cache	TTL-based (default 300s), flushed on embedding model change
L3	PostgreSQL	All memory records, vector embeddings, entity graph, relationships	Never expired (soft-delete via decay scoring)

Vector search

Embeddings use qwen3-embedding:8b running on the dedicated inference host — a 4096-dimension model served via Ollama.

Vectors are stored using binary quantization with an HNSW bit index (migration 261). This enables approximate nearest-neighbor search on 4096-dimensional vectors without full float32 storage overhead.

Retrieval uses a two-stage approach:

Stage 1 (ANN): HNSW bit index retrieves a broad candidate set (ef_search=100)
Stage 2 (rerank): Candidates reranked by full float32 cosine similarity fused with BM25 full-text score via Reciprocal Rank Fusion (RRF_K=60)

Additional search features: AutoCut (dynamic tail trimming), MMR diversity (deduplication by word overlap), date range filtering, scope/source filtering, and a federated bridge search for structured data fallback.

Similarity threshold: 0.45 (trajectory: 0.25 -> 0.30 -> 0.45 as the embedding model improved).

Memory lifecycle

Memories decay over time unless preserved. The decay system runs hourly via BullMQ cron:

State	Score	Behavior
Active	≥ 0.50	Normal retrieval weight
Dormant	≥ 0.10	Reduced retrieval weight
Archived	≥ 0.01	Near-expiry
Expired	< 0.01	Soft-deleted

Memories tagged with permanent, important, decision, architecture, or preference bypass decay entirely.

Two memory types: episode (subject to decay, access-dependent) and knowledge (permanent).

Tool surface (28 tools)

Category	Tools
Core	`memory_store`, `memory_search`, `memory_update`, `memory_delete`, `memory_list`, `memory_batch`, `memory_batch_delete`
Relationships	`memory_relate`, `memory_unrelate`, `memory_get_relations`, `memory_traverse`
Entity	`memory_entity_search`, `memory_entity_graph`, `memory_find_similar`
Analysis	`memory_pattern_search`, `memory_graph_search`, `memory_graph_analysis`, `memory_counter_narrative`, `memory_synthesis_status`
Identity	`memory_identity_claim`
Lifecycle	`memory_preserve`, `memory_supersede`, `memory_consolidate`, `memory_decay_status`
Core scratch	`core_memory_read`, `core_memory_write`, `core_memory_trim`
Stats	`memory_stats`

The stdio Bridge

The memory server uses a custom stdio bridge (scripts/mcp-memory-bridge.cjs) instead of a direct SSE connection. This exists to work around a bug in the Claude Code MCP SDK — when SSE transport is used for memory, the SDK triggers an OAuth discovery flow that fails on LAN. The stdio bridge bypasses this entirely.

Protocol translation:

Claude Code (stdio JSON-RPC)
  → bridge buffers messages in memory
  → HTTP GET /sse (SSE connection to mcp-auth-memory, with credentials)
  → on "endpoint" event: receives POST endpoint URL for this session
  → on stdin message: HTTP POST to endpoint URL
  → on "message" SSE event: write JSON-RPC response to stdout
  → Claude Code reads response from stdout

Reliability features (bridge v3):

Durable spool (on-disk) — messages spooled during a crash are replayed on restart
Map-based retry tracking keyed by JSON-RPC id — retries survive requeue round-trips (max 3, exponential backoff)
Stale endpoint handling — 404/410 response triggers reconnect and re-queues the message
Graceful shutdown — spool is persisted on SIGINT/SIGTERM

Known Gotchas

SSE header forwarding: Claude Code’s SSE transport does not forward custom headers on POST calls after the initial SSE connection. The auth middleware rewrites the stream’s endpoint event to carry the credentials forward in the URL.

mcp-proxy-memory child death: If the mcp-ai-memory stdio child crashes inside the proxy, the proxy returns 5xx. The auth middleware’s health check probes the upstream; if the upstream is unhealthy, Docker restarts the auth container, clearing the stale connection pool. Always restart proxy before auth when recovering.

HNSW bypass via CTE materialization (fixed): A shared CTE in knowledge_search() caused PostgreSQL to materialize the CTE before the HNSW scan, bypassing the index. Fixed in migration 263 by rewriting to inline subqueries. Symptom: slow search + full sequential scan in EXPLAIN.

Embedding model change: After changing the embedding model, manually flush Redis keys matching mcp:embeddings:*. The JS Map cache clears on container restart; the HNSW index must be rebuilt.

Memory tags: Tags cannot contain / or : — slashes are stripped silently. Use hyphens.