Audio Frames

AudioFrame carries a single packet of voice — typically 20 ms of µ-law or PCM. It rides on the same envelope as every other RPC, but on uni-streams to keep latency low and avoid pairing every send with an ack.

`AudioFrame` schema

Field	Type	Notes
`call_sid`	`string`	Identifies the call this packet belongs to.
`payload`	`string`	Raw codec bytes (length-prefixed). Treated as opaque.
`codec`	`string`	`"PCMU"` (G.711 µ-law), `"PCMA"` (G.711 A-law), or `"PCM16"`.
`sequence_number`	`uint64`	Monotonic per `(call_sid, direction)`. Used to detect loss.
`end_of_stream`	`bool`	Final frame; the gateway will close the audio uni-stream.

method_id for an audio frame is always 2991054320 (0xb241_b9b0).

Frame layout on the wire

+----------------+----------------+----------------------+
| u32 length LE  | method_id      | AudioFrame envelope  |
+----------------+----------------+----------------------+

This is the standard envelope format — audio is not special.

Outbound (your mic → trunk)

Open exactly one client-initiated unidirectional stream per call and write framed AudioFrames back-to-back. Don’t open a stream per packet — that overwhelms the gateway’s flow-control budget within seconds.

client uni-stream  ─────[frame][frame][frame] … [frame eos=true]─────▶

After end_of_stream = true, close the stream. The gateway will not accept more frames on it.

Pacing

For µ-law @ 8 kHz with a 20 ms ptime, payload = 160 bytes. Send one frame every 20 ms (50 fps). Faster pacing is buffered by the trunk and arrives late on the far end; slower pacing causes audible gaps.

Inbound (trunk → your speaker)

The gateway opens server-initiated uni-streams. After your EventStreamRequest subscription, every uni-stream is multiplexed: each frame’s method_id determines whether it’s audio (2991054320) or a CallEvent (959835745). A typical demuxer:

async for frame in incoming_uni_streams:
    method_id, body = read_frame(frame)
    if method_id == 2991054320:        # AudioFrame
        af = parse_audio_frame(body)
        speaker.play(af.payload)
    elif method_id == 959835745:       # CallEvent
        ev = parse_call_event(body)
        on_event(ev)

Codec choices

Codec	Bitrate	Use when
`PCMU`	64 kbit/s	Talking to PSTN/SIP. The default for trunks.
`PCMA`	64 kbit/s	EU/PSTN. Same wire shape, different table.
`PCM16`	256 kbit/s	Sending TTS or studio-quality content into the gateway. The gateway re-encodes to the trunk’s codec.

The gateway transcodes for you on ingress; on egress it sends whatever the far-end negotiated. If you need a specific egress codec, set OriginateRequest.default_app_args accordingly.

Loss handling

Use sequence_number to detect dropped packets. The gateway does not retransmit audio (that defeats latency). For PSTN calls, any missing sequence number on egress is heard as a 20 ms silence — fine for voice, catastrophic for DTMF, so use INFO-method DTMF rather than in-band tones when possible.

Getting Started

Concepts

Telephony Reference

Raw RPC Format

Platform

`AudioFrame` schema

Frame layout on the wire

Outbound (your mic → trunk)

Pacing

Inbound (trunk → your speaker)

Codec choices

Loss handling

Getting Started

Concepts

Telephony Reference

Raw RPC Format

Platform

​AudioFrame schema

​Frame layout on the wire

​Outbound (your mic → trunk)

​Pacing

​Inbound (trunk → your speaker)

​Codec choices

​Loss handling

`AudioFrame` schema

Frame layout on the wire

Outbound (your mic → trunk)

Pacing

Inbound (trunk → your speaker)

Codec choices

Loss handling