Turn Detection & Barge-In

Turn detection is the voice-agent term for “who has the floor.” A working agent stops talking the moment the caller starts (barge-in), and starts its response the moment the caller’s utterance ends (end-of-utterance). Get this wrong and the call feels broken: the agent talks over the user, or sits silent for awkward seconds while waiting for “real” silence. TeleQuick handles turn detection in two layers — a local VAD on the gateway and a configurable detection mode per agent — and signals interrupts end-to-end via a control channel between the gateway, the agent runtime, and (where supported) vendor adapters.

The two detection modes

┌─────────────────────────────────────────────────────────────────┐
│  TurnDetection {                                                │
│    type: "silero" | "server_vad",                               │
│    silence_threshold_ms?, min_speech_ms?, prefix_padding_ms?    │
│  }                                                              │
└─────────────────────────────────────────────────────────────────┘

`type`	Who runs VAD	When to use
`silero`	Gateway, locally (libfvad / Silero ONNX)	Cascaded ASR → LLM → TTS. Default.
`server_vad`	The AI provider (e.g. OpenAI Realtime)	Realtime models that already emit `speech_started`/`speech_stopped`.

silero is the default because most pipelines are cascaded (Deepgram → Anthropic → ElevenLabs, for example) and need a local trigger. The runtime auto-promotes to server_vad only when the agent’s entry node is REALTIME AND the operator didn’t supply an explicit type.

End-to-end interrupt flow

When local VAD trips, this chain fires:

caller starts speaking
       │
       ▼
VAD source: libfvad on the gateway (SIP/RTP path)
            OR vendor server-VAD via the agent runtime (direct-media path)
       │
       │  control frame: [0xFFFFFFFF sentinel][type=BARGE_IN][session_id]
       ▼
agent_runtime_bridge → runtime
       │
       ▼
DagExecutor::on_cancel(CancelEvent{source: user_vad})
       │
       ├──▶ ASR / LLM / TTS nodes: forward cancel
       ├──▶ PushAudio node: clear outbound buffer
       └──▶ RealtimeNode: emit `conversation.item.truncate` to OpenAI,
                          close session

The same chain fires on source: server_vad when the AI provider tells us the user spoke first (e.g. OpenAI Realtime’s input_audio_buffer.speech_started). Both sources land in the same handler, so node code is detection-mode-agnostic.

Configuring turn detection per agent

Drop a TurnDetection block into your agent config:

# agent-config.yml
entry_node: ASR
turn_detection:
  type: silero
  silence_threshold_ms: 500    # min silence before EOU fires
  min_speech_ms: 300           # filter out clicks / noise
  prefix_padding_ms: 200       # include this much pre-speech audio
nodes:
  ASR: { provider: deepgram, ... }
  LLM: { provider: anthropic, ... }
  TTS: { provider: elevenlabs, ... }

For a realtime entry node, omit the block entirely and the runtime defaults to server_vad:

entry_node: REALTIME
nodes:
  REALTIME: { provider: openai-realtime, model: gpt-4o-realtime-preview }
# turn_detection auto-promotes to {type: server_vad} based on entry_node

Override is always honored — set type: silero explicitly even with a realtime entry if you want gateway-side detection.

Per-vendor support matrix

Honest scorecard. Where this says “no” the agent will keep talking over the user; treat as a known limitation, not a configuration error.

Agent runtime LLM/ASR/TTS providers

Provider	Cancel mid-flight?	Barge trigger
OpenAI Realtime	✅	Local VAD or `speech_started` → `truncate`
OpenAI HTTP / Anthropic / Gemini / Ollama	❌ (timeout-only)	Gateway silences playback; LLM completes server-side
Deepgram ASR streaming	✅ (forward cancel)	Cancel propagates; partial transcript discarded
Deepgram TTS / ElevenLabs TTS	✅ (clear buffer)	Outbound PCM buffer cleared at PushAudio node

WebRTC / streaming vendor adapters

Vendor	Path	Interrupt signal	Status
SIP/RTP	Native PSTN trunk	libfvad → bridge control	✅ End-to-end
Twilio	Media Streams	`clear` event	⚠️ Adapter doesn’t yet emit it on barge
Vapi	WebSocket transport	`stop` message	⚠️ Adapter doesn’t yet emit it on barge
LiveKit	Room participant	None (no clean media-plane interrupt)	❌ Client-side action required
Daily	Room participant	None	❌ Client-side action required
Chime	Meeting attendee	None	❌ Client-side action required
Browser	WebTransport	Caller mic → server VAD	✅ Same as SIP path

For ❌ vendors, the AI’s TTS keeps streaming; ClutchCall silences the playback locally but the AI is unaware. For latency-sensitive UX, prefer SIP/RTP, Browser, or — once the adapter work lands — Twilio/Vapi.

Tuning parameters

Field	Default	What it does	When to change
`silence_threshold_ms`	500	Min trailing silence before EOU fires	Lower (300) for snappier replies; higher (800) for thinkers
`min_speech_ms`	300	Discard candidate utterances shorter than this	Raise to 500 if line noise causes false triggers
`prefix_padding_ms`	200	Audio kept before the speech-start marker	Raise if first syllables are getting clipped on the ASR
`silence_floor_dbfs`	-45	Signal level below which is treated as silence	Quieter trunks need lower (-50); noisy ones higher (-40)

Common gotchas

Realtime entry node + explicit type: silero: works, but you’re doing VAD twice. The OpenAI server-side VAD will still fire and may produce speech_started events the runtime ignores. Pick one.
silence_threshold_ms too low (< 300): end-of-utterance fires on inter-word pauses; agent interrupts the user mid-sentence.
Cascaded pipeline with server_vad: only the realtime model can produce server-VAD events. A Deepgram → Anthropic → ElevenLabs pipeline with server_vad will never emit a turn boundary. Misconfiguration — use silero instead.
HTTP LLM “barge” feels delayed: Anthropic/Gemini/OpenAI HTTP run to completion server-side. TeleQuick silences the audio playback but the server still bills you for the full response. For tight barge-in budgets, use a Realtime model.

Where to dig further

get_page rpc/audio-frames — wire format for the PCM/PCMU stream that feeds the gateway VAD.
get_page rpc/method-ids — Barge method ID for operator-initiated interrupts.
get_page admin/agent-dags — full TurnDetection schema in the agent config.

Getting Started

Concepts

Telephony Reference

Raw RPC Format

Platform

Turn Detection & Barge-In

The two detection modes

End-to-end interrupt flow

Configuring turn detection per agent

Per-vendor support matrix

Agent runtime LLM/ASR/TTS providers

WebRTC / streaming vendor adapters

Tuning parameters

Common gotchas

Where to dig further

Getting Started

Concepts

Telephony Reference

Raw RPC Format

Platform

​The two detection modes

​End-to-end interrupt flow

​Configuring turn detection per agent

​Per-vendor support matrix

​Agent runtime LLM/ASR/TTS providers

​WebRTC / streaming vendor adapters

​Tuning parameters

​Common gotchas

​Where to dig further

The two detection modes

End-to-end interrupt flow

Configuring turn detection per agent

Per-vendor support matrix

Agent runtime LLM/ASR/TTS providers

WebRTC / streaming vendor adapters

Tuning parameters

Common gotchas

Where to dig further