Voice Pipeline

The voice pipeline turns “Hey GLaDOS” into a response played through the living room speakers — entirely on local hardware, with no cloud services in the loop. Three distinct pipelines handle different invocation contexts: the hardware satellite, the Mac terminal, and the web dashboard.

Architecture Overview

All voice processing runs on two machines: the Pi 5 (Caroline) hosts the Wyoming STT/TTS/wake-word containers and orchestrates the HA Assist pipeline; Atlas (the M4 Pro) runs Ollama for LLM inference. nightwatch (the AMD GPU machine) provides specialty TTS backends on demand, woken via Wake-on-LAN when needed.

Voice pipeline — Satellite wake word detection through STT, n8n processing, LLM inference, TTS, and Sonos output

[Satellite hardware]
       |
[openWakeWord: "Hey GLaDOS"]
       |
[Whisper: speech to text]
       |
[n8n webhook → Ollama on Atlas]
       |
[Piper TTS: text to speech]
       |
[Sonos speakers: audio output]

The Three Pipelines

Pipeline 1: The Satellite (Primary)

The Satellite1 is a custom ESPHome device with a microphone array that listens continuously for wake words. It runs server-side wake word detection, meaning the raw audio stream is forwarded to the Pi’s openWakeWord container rather than running detection on-device. This keeps the hardware simple and the models upgradeable.

Once the wake word fires:

The audio stream goes to Whisper (faster-whisper small-int8, English) for transcription
The transcript reaches Home Assistant’s Assist pipeline, which routes it through the m_agent custom component to n8n via a local webhook
n8n calls Ollama on Atlas for response generation
The response goes to Piper TTS for synthesis
Audio is returned to the Satellite, which has no built-in speaker — playback routes to the nearest Sonos

The satellite is on the IoT network VLAN, isolated from the main LAN. The wake word detection, STT, and TTS containers listen only on loopback ports; HA reaches them via localhost because it runs in host network mode.

Pipeline 2: The Terminal (Mac)

scripts/glados-say.sh is a command-line script that sends text to any of the voice backends on nightwatch and plays the audio locally. It selects the backend by name and logs the interaction to the dashboard API.

Backend	Technology	Approximate Latency
`glados`	Forward Tacotron + HiFiGAN (Wyoming)	~1s
`kokoro`	Kokoro-82M (OpenAI-compat HTTP)	~0.2s
`xtts`	XTTS v2, GLaDOS fine-tune (Wyoming)	2-5s
`m`	Chatterbox Turbo (Judi Dench voice, Wyoming)	varies
`peter`	Peter Griffin RVC v2 (HTTP)	varies
`peter2`	Peter Griffin GPT-SoVITS (HTTP)	varies

Pipeline 3: The Dashboard (Web)

The dashboard’s /chat page uses the Web Speech API for voice input and useTTS for synthesis output. Transcribed speech goes to n8n, which routes to Ollama or Claude depending on the request, and the response plays back in-browser via Web Audio.

Voice Components on the Pi

Five containers make up the on-Pi voice stack, co-deployed with Home Assistant.

Container	Image	Role
`wyoming-whisper`	`rhasspy/wyoming-whisper`	STT — faster-whisper small-int8 (loopback only)
`wyoming-piper`	`rhasspy/wyoming-piper`	TTS — Piper en_US-lessac-medium (loopback only)
`wyoming-openwakeword`	`rhasspy/wyoming-openwakeword`	Wake word detection, TFLite (loopback only)
`homeassistant`	`ghcr.io/home-assistant/home-assistant`	Pipeline orchestrator (port 8123)
`esphome`	`ghcr.io/esphome/esphome`	Satellite firmware management

Wake Word Models

Custom wake word models are TFLite format, trained on nightwatch’s AMD GPU using tools/wake-words/train_all.sh. They live in ha-data/openwakeword-custom/.

Model	Type
`hey_glados`	Custom (primary active wake word)
`glados`	Custom
`claude`	Custom
`hudson`	Custom
`maude` / `hey_maude`	Custom
`jarvis`	Community
`computer`	Community
`ok_computer`	Community
`okay_nabu`, `hey_jarvis`, `hey_mycroft`, `alexa`, `hey_rhasspy`	Built-in (always available)

The Satellite Hardware

The Satellite1 is a FutureProofHomes ESPHome device with a microphone array. It connects to the IoT VLAN and communicates with the Pi over the Wyoming protocol.

Key properties:

Wake word processing: server-side (audio streamed to Pi; no on-device inference)
Speaker: none built in — all TTS audio routes to Sonos
Active wake word: hey_glados
Firmware config: ha-config/esphome/satellite1-voice-patch.yaml and device config
OTA flashing: via ESPHome dashboard (on-demand only, not always running)

The ESPHome repository also manages three BLE proxy devices (bathroom and kitchen) and three MTR1 presence and temperature sensors (bedroom, garage, living room).

nightwatch: On-Demand TTS

nightwatch (the AMD Radeon 7900 XTX machine) hosts all the specialty TTS backends. It is not running continuously — a Wake-on-LAN automation pre-wakes it when the satellite detects a wake word, with a 12-second budget from suspend to service-ready.

Idle management:

An activity monitor script stamps input_datetime.nightwatch_last_active every 60 seconds via a systemd timer
If idle for 5 minutes with no session active, HA fires the idle shutdown automation
A nightwatch_keep_alive boolean overrides the idle timeout when needed

Voice Interaction Archival

Every satellite voice interaction is automatically archived. The push_voice_interaction HA automation sends the dialogue to an n8n webhook, which writes it to two PostgreSQL tables: voice_interactions (session metadata) and dialogue (the full transcript). This creates a permanent record of all voice interactions for analysis and memory retrieval.

The M Voice Project

The M voice is a custom voice clone built with Chatterbox Turbo, targeting a Judi Dench-inspired voice for a personalized assistant experience. A dataset of 1,137 audio clips is prepared and ready. The M voice backend is already deployed on Nightwatch and accessible from glados-say.sh. Full integration into the satellite pipeline is the next milestone.

Key Configuration Files

File	Purpose
`pi/docker-compose.ha.yml`	Pi HA stack: HA + all Wyoming containers
`docker-compose.voice.yml`	Mac dev voice stack (Wyoming only)
`scripts/glados-say.sh`	Terminal TTS script, all backends
`ha-config/custom_components/m_agent/`	Custom n8n-routing conversation agent
`ha-config/esphome/satellite1-voice-patch.yaml`	Server-side wake word patch for Satellite1
`nightwatch/scripts/activity-monitor.sh`	Idle detection for nightwatch power management
`tools/wake-words/train_all.sh`	Wake word model training runner