Turn detection is the voice-agent term for “who has the floor.” A working agent stops talking the moment the caller starts (barge-in), and starts its response the moment the caller’s utterance ends (end-of-utterance). Get this wrong and the call feels broken: the agent talks over the user, or sits silent for awkward seconds while waiting for “real” silence. TeleQuick handles turn detection in two layers — a local VAD on the gateway and a configurable detection mode per agent — and signals interrupts end-to-end via a control channel between the gateway, the agent runtime, and (where supported) vendor adapters.

The two detection modes

┌─────────────────────────────────────────────────────────────────┐
│  TurnDetection {                                                │
│    type: "silero" | "server_vad",                               │
│    silence_threshold_ms?, min_speech_ms?, prefix_padding_ms?    │
│  }                                                              │
└─────────────────────────────────────────────────────────────────┘
typeWho runs VADWhen to use
sileroGateway, locally (libfvad / Silero ONNX)Cascaded ASR → LLM → TTS. Default.
server_vadThe AI provider (e.g. OpenAI Realtime)Realtime models that already emit speech_started/speech_stopped.
silero is the default because most pipelines are cascaded (Deepgram → Anthropic → ElevenLabs, for example) and need a local trigger. The runtime auto-promotes to server_vad only when the agent’s entry node is REALTIME AND the operator didn’t supply an explicit type.

End-to-end interrupt flow

When local VAD trips, this chain fires:
caller starts speaking


VAD source: libfvad on the gateway (SIP/RTP path)
            OR vendor server-VAD via the agent runtime (direct-media path)

       │  control frame: [0xFFFFFFFF sentinel][type=BARGE_IN][session_id]

agent_runtime_bridge → runtime


DagExecutor::on_cancel(CancelEvent{source: user_vad})

       ├──▶ ASR / LLM / TTS nodes: forward cancel
       ├──▶ PushAudio node: clear outbound buffer
       └──▶ RealtimeNode: emit `conversation.item.truncate` to OpenAI,
                          close session
The same chain fires on source: server_vad when the AI provider tells us the user spoke first (e.g. OpenAI Realtime’s input_audio_buffer.speech_started). Both sources land in the same handler, so node code is detection-mode-agnostic.

Configuring turn detection per agent

Drop a TurnDetection block into your agent config:
# agent-config.yml
entry_node: ASR
turn_detection:
  type: silero
  silence_threshold_ms: 500    # min silence before EOU fires
  min_speech_ms: 300           # filter out clicks / noise
  prefix_padding_ms: 200       # include this much pre-speech audio
nodes:
  ASR: { provider: deepgram, ... }
  LLM: { provider: anthropic, ... }
  TTS: { provider: elevenlabs, ... }
For a realtime entry node, omit the block entirely and the runtime defaults to server_vad:
entry_node: REALTIME
nodes:
  REALTIME: { provider: openai-realtime, model: gpt-4o-realtime-preview }
# turn_detection auto-promotes to {type: server_vad} based on entry_node
Override is always honored — set type: silero explicitly even with a realtime entry if you want gateway-side detection.

Per-vendor support matrix

Honest scorecard. Where this says “no” the agent will keep talking over the user; treat as a known limitation, not a configuration error.

Agent runtime LLM/ASR/TTS providers

ProviderCancel mid-flight?Barge trigger
OpenAI RealtimeLocal VAD or speech_startedtruncate
OpenAI HTTP / Anthropic / Gemini / Ollama❌ (timeout-only)Gateway silences playback; LLM completes server-side
Deepgram ASR streaming✅ (forward cancel)Cancel propagates; partial transcript discarded
Deepgram TTS / ElevenLabs TTS✅ (clear buffer)Outbound PCM buffer cleared at PushAudio node

WebRTC / streaming vendor adapters

VendorPathInterrupt signalStatus
SIP/RTPNative PSTN trunklibfvad → bridge control✅ End-to-end
TwilioMedia Streamsclear event⚠️ Adapter doesn’t yet emit it on barge
VapiWebSocket transportstop message⚠️ Adapter doesn’t yet emit it on barge
LiveKitRoom participantNone (no clean media-plane interrupt)❌ Client-side action required
DailyRoom participantNone❌ Client-side action required
ChimeMeeting attendeeNone❌ Client-side action required
BrowserWebTransportCaller mic → server VAD✅ Same as SIP path
For vendors, the AI’s TTS keeps streaming; ClutchCall silences the playback locally but the AI is unaware. For latency-sensitive UX, prefer SIP/RTP, Browser, or — once the adapter work lands — Twilio/Vapi.

Tuning parameters

FieldDefaultWhat it doesWhen to change
silence_threshold_ms500Min trailing silence before EOU firesLower (300) for snappier replies; higher (800) for thinkers
min_speech_ms300Discard candidate utterances shorter than thisRaise to 500 if line noise causes false triggers
prefix_padding_ms200Audio kept before the speech-start markerRaise if first syllables are getting clipped on the ASR
silence_floor_dbfs-45Signal level below which is treated as silenceQuieter trunks need lower (-50); noisy ones higher (-40)

Common gotchas

  • Realtime entry node + explicit type: silero: works, but you’re doing VAD twice. The OpenAI server-side VAD will still fire and may produce speech_started events the runtime ignores. Pick one.
  • silence_threshold_ms too low (< 300): end-of-utterance fires on inter-word pauses; agent interrupts the user mid-sentence.
  • Cascaded pipeline with server_vad: only the realtime model can produce server-VAD events. A Deepgram → Anthropic → ElevenLabs pipeline with server_vad will never emit a turn boundary. Misconfiguration — use silero instead.
  • HTTP LLM “barge” feels delayed: Anthropic/Gemini/OpenAI HTTP run to completion server-side. TeleQuick silences the audio playback but the server still bills you for the full response. For tight barge-in budgets, use a Realtime model.

Where to dig further

  • get_page rpc/audio-frames — wire format for the PCM/PCMU stream that feeds the gateway VAD.
  • get_page rpc/method-idsBarge method ID for operator-initiated interrupts.
  • get_page admin/agent-dags — full TurnDetection schema in the agent config.