Skip to content

MCP Layer

The Model Context Protocol (MCP) is the bridge between AI assistants (Claude Code, claude.ai, Claude mobile) and the infrastructure. Instead of pasting log output or querying services manually, the AI connects directly to live systems: it can search semantic memory, trigger n8n workflows, inspect containers, query the database, and control Home Assistant — all through a unified protocol.

chris-os runs 14 MCP servers exposing over 600 tools in total.

MCP architecture — 14 servers grouped by transport, auth proxy layer, stdio bridge, 3-tier memory cache

ServerTransportPurposeTools
memorystdio (bridge)Persistent semantic memory — store, search, relate, and analyze knowledge across sessions28
home-assistantstdio (uvx)Home Assistant control — devices, automations, entities, scenes89
githubstdioRepo management, issues, PRs, GitHub Projects~30
n8nSSEn8n workflow management — list, create, update, test workflows~20
unifiSSEUniFi network control — device inventory, VLAN inspection, client management~15
dockerstdioDocker management on the production host — containers, images, networks~15
google-workspacestdio (uvx)Google Docs, Sheets, Gmail, Calendar, Drive, Tasks~60
docs-mcp-serverSSE (local)Local documentation index — 159 libraries, full-text + semantic search~10
ollamastdio (npx)Ollama model management on the inference host~10
discordstdio (npx)Discord server management — messages, channels, roles~50
context7stdio (npx)Live library documentation via Upstash~5
d2stdioDiagram creation and export using D2 language~8
brewersfriendstdioBrewing data — recipes, fermentation sessions~5
gcloud / google-workspacestdioGoogle Cloud and Google Workspace operationsvaries

Servers connect via two transports: stdio (process spawned locally, communicates over stdin/stdout) or SSE (Server-Sent Events over HTTP, persistent stream for server-push).

Claude Code
├── memory stdio → bridge script → HTTP/SSE → production host (MCP auth)
├── n8n SSE → production host (MCP auth)
├── unifi SSE → production host (SSH tunnel, loopback only)
├── home-assistant stdio → uvx ha-mcp → HA API via long-lived token
├── github stdio → GitHub API via PAT
├── docker stdio → uvx mcp-server-docker (SSH DOCKER_HOST to prod host)
├── ollama stdio → npx ollama-mcp → inference host (Ollama API)
├── docs-mcp-server SSE → localhost (local process only)
└── d2 / discord / context7 / brewersfriend / gcloud / workspace stdio → local or cloud APIs

For n8n and unifi, the SSE transport connects directly to the production host over LAN. Each connection passes through a Caddy listener that routes to the appropriate auth middleware container.

claude.ai
→ Cloudflare Worker (OAuth provider)
GitHub OAuth → issues scoped Bearer tokens
Routes to:
/api/db → Cloudflare Tunnel → mcp-auth-postgres
/api/n8n → Cloudflare Tunnel → mcp-auth-n8n
/api/memory → Cloudflare Tunnel → mcp-auth-memory
/api/ha → Cloudflare Tunnel → mcp-auth-ha

The OAuth Worker uses dedicated Cloudflare Tunnel hostnames to avoid a Cloudflare same-zone fetch loop — a Worker cannot fetch a hostname on the same zone directly.

Every MCP service on the production host follows the same four-layer chain:

Internet / LAN
→ Caddy (net-frontend)
→ mcp-auth-{service} (net-mcp + net-frontend) ← validates credentials
→ mcp-proxy-{service} (net-mcp + net-data/net-app) ← protocol bridge
→ upstream service (postgres / n8n / mcp-ai-memory)

This separation means the upstream services (postgres, n8n) are never directly reachable from outside net-data or net-app. The auth containers are the only point of credential validation.

Two credential types are accepted at each MCP auth middleware container:

API key (LAN / Claude Desktop / scripts / n8n):

  • Per-service keys for blast-radius isolation
  • Multiple keys per service supported for zero-downtime rotation
  • On LAN: the auth middleware rewrites SSE endpoint event URLs to carry credentials automatically on subsequent MCP SDK POST calls

Cloudflare Access JWT (claude.ai / remote):

  • Issued by the OAuth Worker after GitHub OAuth completes
  • Validated against the Cloudflare Access team and per-service audience claims
  • Only the GitHub account on the allowlist can obtain tokens

On LAN, the auth middleware terminates the MCP SDK’s OAuth discovery flow and forces immediate fallback to API key authentication.

The memory server is the most architecturally complex component. It provides persistent, searchable semantic memory across all AI sessions — storing decisions, insights, project state, and conversation context.

Claude Code / claude.ai
→ stdio bridge (LAN) or OAuth Worker + Tunnel (remote)
→ mcp-auth-memory
→ mcp-proxy-memory
→ mcp-ai-memory (upstream MCP server, stdio child of proxy)
├── PostgreSQL (memory schema, app role)
├── Redis (BullMQ job queue + search cache)
└── Ollama on inference host (embeddings)

mcp-ai-memory is a heavily patched fork of an upstream MCP memory server — 48 patches applied against the original. The proxy wraps it as a stdio child process and exposes an HTTP/SSE interface to the auth middleware.

TierTechnologyContentsInvalidation
L1JS Map (in-process)Search result cache, keyed by query hashOn every successful memory_store call
L2RedisBullMQ job queue (embedding + clustering), search result cacheTTL-based (default 300s), flushed on embedding model change
L3PostgreSQLAll memory records, vector embeddings, entity graph, relationshipsNever expired (soft-delete via decay scoring)

Embeddings use qwen3-embedding:8b running on the dedicated inference host — a 4096-dimension model served via Ollama.

Vectors are stored using binary quantization with an HNSW bit index (migration 261). This enables approximate nearest-neighbor search on 4096-dimensional vectors without full float32 storage overhead.

Retrieval uses a two-stage approach:

  1. Stage 1 (ANN): HNSW bit index retrieves a broad candidate set (ef_search=100)
  2. Stage 2 (rerank): Candidates reranked by full float32 cosine similarity fused with BM25 full-text score via Reciprocal Rank Fusion (RRF_K=60)

Additional search features: AutoCut (dynamic tail trimming), MMR diversity (deduplication by word overlap), date range filtering, scope/source filtering, and a federated bridge search for structured data fallback.

Similarity threshold: 0.45 (trajectory: 0.25 -> 0.30 -> 0.45 as the embedding model improved).

Memories decay over time unless preserved. The decay system runs hourly via BullMQ cron:

StateScoreBehavior
Active≥ 0.50Normal retrieval weight
Dormant≥ 0.10Reduced retrieval weight
Archived≥ 0.01Near-expiry
Expired< 0.01Soft-deleted

Memories tagged with permanent, important, decision, architecture, or preference bypass decay entirely.

Two memory types: episode (subject to decay, access-dependent) and knowledge (permanent).

CategoryTools
Corememory_store, memory_search, memory_update, memory_delete, memory_list, memory_batch, memory_batch_delete
Relationshipsmemory_relate, memory_unrelate, memory_get_relations, memory_traverse
Entitymemory_entity_search, memory_entity_graph, memory_find_similar
Analysismemory_pattern_search, memory_graph_search, memory_graph_analysis, memory_counter_narrative, memory_synthesis_status
Identitymemory_identity_claim
Lifecyclememory_preserve, memory_supersede, memory_consolidate, memory_decay_status
Core scratchcore_memory_read, core_memory_write, core_memory_trim
Statsmemory_stats

The memory server uses a custom stdio bridge (scripts/mcp-memory-bridge.cjs) instead of a direct SSE connection. This exists to work around a bug in the Claude Code MCP SDK — when SSE transport is used for memory, the SDK triggers an OAuth discovery flow that fails on LAN. The stdio bridge bypasses this entirely.

Protocol translation:

Claude Code (stdio JSON-RPC)
→ bridge buffers messages in memory
→ HTTP GET /sse (SSE connection to mcp-auth-memory, with credentials)
→ on "endpoint" event: receives POST endpoint URL for this session
→ on stdin message: HTTP POST to endpoint URL
→ on "message" SSE event: write JSON-RPC response to stdout
→ Claude Code reads response from stdout

Reliability features (bridge v3):

  • Durable spool (on-disk) — messages spooled during a crash are replayed on restart
  • Map-based retry tracking keyed by JSON-RPC id — retries survive requeue round-trips (max 3, exponential backoff)
  • Stale endpoint handling — 404/410 response triggers reconnect and re-queues the message
  • Graceful shutdown — spool is persisted on SIGINT/SIGTERM

SSE header forwarding: Claude Code’s SSE transport does not forward custom headers on POST calls after the initial SSE connection. The auth middleware rewrites the stream’s endpoint event to carry the credentials forward in the URL.

mcp-proxy-memory child death: If the mcp-ai-memory stdio child crashes inside the proxy, the proxy returns 5xx. The auth middleware’s health check probes the upstream; if the upstream is unhealthy, Docker restarts the auth container, clearing the stale connection pool. Always restart proxy before auth when recovering.

HNSW bypass via CTE materialization (fixed): A shared CTE in knowledge_search() caused PostgreSQL to materialize the CTE before the HNSW scan, bypassing the index. Fixed in migration 263 by rewriting to inline subqueries. Symptom: slow search + full sequential scan in EXPLAIN.

Embedding model change: After changing the embedding model, manually flush Redis keys matching mcp:embeddings:*. The JS Map cache clears on container restart; the HNSW index must be rebuilt.

Memory tags: Tags cannot contain / or : — slashes are stripped silently. Use hyphens.