Architecture

Three layers

                ┌──────────────────────────────────────────────┐
                │           Your application code              │
                │   (Python / TS / Go / Rust / Java / .NET)    │
                └─────────────────────┬────────────────────────┘
                                      │ idiomatic methods
                ┌─────────────────────▼────────────────────────┐
                │          TeleQuick language SDK           │
                │     (uses the C++ core via FFI / WASM)       │
                └─────────────────────┬────────────────────────┘
                                      │ C ABI: telequick_buffer
                ┌─────────────────────▼────────────────────────┐
                │   C++23 core (one binary, all languages)     │
                │  - serde envelope encoder/decoder            │
                │  - APM pipeline (AEC, AGC2, resampler)       │
                │  - method-id table                           │
                └─────────────────────┬────────────────────────┘
                                      │ QUIC stream / datagram / WebRTC
                ┌─────────────────────▼────────────────────────┐
                │           TeleQuick gateway               │
                │  (Seastar shards, SIP, WebRTC, AI bridge)    │
                └──────────────────────────────────────────────┘

The C++ core is the source of truth: every SDK is a thin shim that calls telequick_rpc_<method> to produce a wire-ready buffer, then writes that buffer onto a QUIC stream.

Why one core

Two reasons. Identical envelopes. The gateway only ever sees the same little-endian serde encoding regardless of which SDK produced it. There is no per-language parser to keep in sync — only the C++ core, plus the auto-generated method-id table that gets emitted into each language at build time. Audio without a copy. µ-law / A-law to 16-bit PCM conversion is done in SIMD inside the core. Browsers and Node alike call into the same WASM build of that core, so a barge-in from a browser tab and a barge-in from a Python process traverse the same code path with the same latency profile.

QUIC framing

Streams are typed by direction:

Direction	Carries
Client → server, bidi	RPC request envelopes (`Originate`, `Terminate`, …)
Client → server, uni	Outbound `AudioFrame` (microphone → trunk)
Server → client, uni	`CallEvent` and inbound `AudioFrame` (trunk → app)

Each frame on the wire is:

+----------------+----------------+----------------------+
| u32 length LE  | u32 method_id  | serde envelope body  |
+----------------+----------------+----------------------+

Where length = 4 + len(body) (it covers the method id and the body, but not itself). See Envelope Format for full detail.

Connection lifecycle

Connect. SDK dials quic://engine.telequick.dev:9090 with ALPN h3.
Handshake. SDK signs a JWT from the service account (TELEQUICK_CREDENTIALS) and embeds it on the first envelope.
Subscribe. SDK sends EventStreamRequest with a per-process client_id so the gateway knows which uni-stream to direct events at.
RPCs. Each call (Originate, Terminate, Barge, …) opens a fresh bidirectional stream, writes one envelope, and closes the write half.
Streaming. Audio and events arrive on server-initiated uni-streams demuxed by method_id.

Where things run

Browsers. TypeScript SDK → WebTransport over HTTP/3 for control RPCs; WebRTC (DTLS-SRTP) for media when a low-latency mic/speaker pipe is needed. The C++ core is compiled to WebAssembly via Emscripten.
Servers. Native SDKs load telequick_core_ffi.so / .dll and reach the gateway over plain QUIC. JNI for Java, P/Invoke for .NET, CGO for Go, libloading for Rust, ctypes for Python.

Transport surfaces

The gateway accepts media on three transports — pick whichever your runtime supports best:

Transport	Best for
QUIC	Native SDKs (Python, Go, Rust, Java, .NET); lowest overhead.
WebTransport	Browsers, when control-plane multiplexing matters more than peer-to-peer media.
WebRTC	Browsers and any client that needs ICE/STUN traversal, congestion control, or interop with existing WebRTC stacks (LiveKit, Daily, Chime).

Direct-media

For server-side AI calls (default_app=AI_BIDIRECTIONAL_STREAM), the gateway and the agent runtime now use a direct-media path. The gateway negotiates SIP signalling with the carrier, then publishes the SDP answer pointing to the agent runtime’s RTP socket. RTP flows straight from the carrier to the runtime — the gateway is signalling-only on this path. PR1 of the redesign (the pool primitive + env-gated dispatch splice) shipped 2026-05-05 and the first end-to-end audible OpenAI Realtime call landed 2026-05-06. For SIP/RTP-only calls (no AI bridge), the gateway still terminates RTP and runs libfvad locally. So the same call_sid can take either RTP path depending on default_app.

Code generation

Method IDs and DTO definitions in every language come from one IDL file (api/telequick.json) compiled by apirpc_compiler.py. To add a new RPC, edit the IDL and rerun the compiler — never edit the per-language method tables by hand.

Getting Started

Concepts

Telephony Reference

Raw RPC Format

Platform

Three layers

Why one core

QUIC framing

Connection lifecycle

Where things run

Transport surfaces

Direct-media

Code generation

Getting Started

Concepts

Telephony Reference

Raw RPC Format

Platform

​Three layers

​Why one core

​QUIC framing

​Connection lifecycle

​Where things run

​Transport surfaces

​Direct-media

​Code generation

Three layers

Why one core

QUIC framing

Connection lifecycle

Where things run

Transport surfaces

Direct-media

Code generation