The two detection modes
type | Who runs VAD | When to use |
|---|---|---|
silero | Gateway, locally (libfvad / Silero ONNX) | Cascaded ASR → LLM → TTS. Default. |
server_vad | The AI provider (e.g. OpenAI Realtime) | Realtime models that already emit speech_started/speech_stopped. |
silero is the default because most pipelines are cascaded
(Deepgram → Anthropic → ElevenLabs, for example) and need a local trigger.
The runtime auto-promotes to server_vad only when the agent’s entry
node is REALTIME AND the operator didn’t supply an explicit type.
End-to-end interrupt flow
When local VAD trips, this chain fires:source: server_vad when the AI provider tells
us the user spoke first (e.g. OpenAI Realtime’s
input_audio_buffer.speech_started). Both sources land in the same
handler, so node code is detection-mode-agnostic.
Configuring turn detection per agent
Drop aTurnDetection block into your agent config:
server_vad:
type: silero explicitly even with a
realtime entry if you want gateway-side detection.
Per-vendor support matrix
Honest scorecard. Where this says “no” the agent will keep talking over the user; treat as a known limitation, not a configuration error.Agent runtime LLM/ASR/TTS providers
| Provider | Cancel mid-flight? | Barge trigger |
|---|---|---|
| OpenAI Realtime | ✅ | Local VAD or speech_started → truncate |
| OpenAI HTTP / Anthropic / Gemini / Ollama | ❌ (timeout-only) | Gateway silences playback; LLM completes server-side |
| Deepgram ASR streaming | ✅ (forward cancel) | Cancel propagates; partial transcript discarded |
| Deepgram TTS / ElevenLabs TTS | ✅ (clear buffer) | Outbound PCM buffer cleared at PushAudio node |
WebRTC / streaming vendor adapters
| Vendor | Path | Interrupt signal | Status |
|---|---|---|---|
| SIP/RTP | Native PSTN trunk | libfvad → bridge control | ✅ End-to-end |
| Twilio | Media Streams | clear event | ⚠️ Adapter doesn’t yet emit it on barge |
| Vapi | WebSocket transport | stop message | ⚠️ Adapter doesn’t yet emit it on barge |
| LiveKit | Room participant | None (no clean media-plane interrupt) | ❌ Client-side action required |
| Daily | Room participant | None | ❌ Client-side action required |
| Chime | Meeting attendee | None | ❌ Client-side action required |
| Browser | WebTransport | Caller mic → server VAD | ✅ Same as SIP path |
❌ vendors, the AI’s TTS keeps streaming; ClutchCall silences the
playback locally but the AI is unaware. For latency-sensitive UX, prefer
SIP/RTP, Browser, or — once the adapter work lands — Twilio/Vapi.
Tuning parameters
| Field | Default | What it does | When to change |
|---|---|---|---|
silence_threshold_ms | 500 | Min trailing silence before EOU fires | Lower (300) for snappier replies; higher (800) for thinkers |
min_speech_ms | 300 | Discard candidate utterances shorter than this | Raise to 500 if line noise causes false triggers |
prefix_padding_ms | 200 | Audio kept before the speech-start marker | Raise if first syllables are getting clipped on the ASR |
silence_floor_dbfs | -45 | Signal level below which is treated as silence | Quieter trunks need lower (-50); noisy ones higher (-40) |
Common gotchas
- Realtime entry node + explicit
type: silero: works, but you’re doing VAD twice. The OpenAI server-side VAD will still fire and may producespeech_startedevents the runtime ignores. Pick one. silence_threshold_mstoo low (< 300): end-of-utterance fires on inter-word pauses; agent interrupts the user mid-sentence.- Cascaded pipeline with
server_vad: only the realtime model can produce server-VAD events. A Deepgram → Anthropic → ElevenLabs pipeline withserver_vadwill never emit a turn boundary. Misconfiguration — usesileroinstead. - HTTP LLM “barge” feels delayed: Anthropic/Gemini/OpenAI HTTP run to completion server-side. TeleQuick silences the audio playback but the server still bills you for the full response. For tight barge-in budgets, use a Realtime model.
Where to dig further
get_page rpc/audio-frames— wire format for the PCM/PCMU stream that feeds the gateway VAD.get_page rpc/method-ids—Bargemethod ID for operator-initiated interrupts.get_page admin/agent-dags— fullTurnDetectionschema in the agent config.