Voice¶
Voice is the flagship surface: a family talks to the Pi in their home and hears a reply. This page covers the pipeline, the wake word, and the proactive path. For the hop-by-hop trace see A request, end to end.
The pipeline¶
The Pi runs sudoedge, which does only audio + wake detection. All STT/LLM/TTS
happens in the cloud, in voice-bridge (a livekit-agents worker). The Pi and
voice-bridge meet inside a LiveKit room named room_<user_id>.
Wake word: "hey sudo"¶
- Detected on-device by a small ONNX model at
sudoedge/models/hey_sudo.onnx. - A wake un-gates the mic for one turn; the turn ends on a cloud lifecycle event (the cloud owns turn-taking — the edge holds no silence timers). There is no mid-speech "say hey sudo to interrupt" barge-in: the user waits for the short reply and wakes again.
- Training the model is its own topic — see Wake-word training.
Tuned for a noisy family home
The device is for families — adults, ageing parents, and children — in real living rooms, not a quiet developer's desk. Wake sensitivity and turn-taking are tuned for that, not for clean studio speech.
The voice is Indic by default¶
In production the voice stack is Hindi via Sarvam (STT saaras:v3, TTS bulbul:v3),
not English. This matters: any language-specific component (endpointing models,
turn-detection, wake tuning) must match the configured language.
Don't default-enable English turn-detection/endpointing
Turn-detection and endpointing models are opt-in and language-matched. An English end-of-utterance model was once enabled by default and broke the live Indic setup. If you touch turn-taking, gate it behind config and match the language.
Proactive voice¶
The agent can speak unprompted — a cron reminder, a send_message. That goes through
the sudo_voice plugin:
voice-bridge looks up the active AgentSession by room name and cross-thread-dispatches
session.say(text) onto the agents loop. If the device is offline there's no session, so
it returns 404 and the agent can choose WhatsApp instead.
Where to look¶
| Concern | File |
|---|---|
| Voice worker (STT/hermes/TTS, session mgmt) | cloud/voice_bridge/main.py |
| Pi-side persistent link + turn loop | sudoedge/voice_link.py |
| Pi-side LiveKit speaker + token | sudoedge/lk_client.py |
| Earcons / cues (incl. "still thinking") | sudoedge/announce.py |
| Wake detection | sudoedge/wake.py, sudoedge/models/hey_sudo.onnx |
| Audio device selection | sudoedge/audio_devices.py (see Audio devices) |
For the original engineering notes, see docs/livekit-setup.md and
docs/voice-bridge-sse.md in the repo.