Welcome to LLMRTC

LLMRTC is a TypeScript SDK for building real-time voice and vision AI applications. It combines WebRTC for low-latency audio/video streaming with LLMs, speech-to-text, and text-to-speech—all through a unified, provider-agnostic API.

What is LLMRTC?

LLMRTC handles the complex infrastructure needed for conversational AI:

You focus on your application logic. LLMRTC handles:

Real-time audio/video streaming via WebRTC
Voice activity detection and barge-in
Provider orchestration and streaming pipelines
Session management and reconnection

Key Features

Real-Time Voice

Stream audio bidirectionally with sub-second latency. Server-side VAD detects speech boundaries, and barge-in lets users interrupt the assistant naturally.

Vision Support

Send camera frames or screen captures alongside speech. Vision-capable models can see what users see.

Provider Agnostic

Switch between OpenAI, Anthropic, Google Gemini, AWS Bedrock, or local models without changing your code. Mix providers freely (e.g., Claude for LLM, Whisper for STT, ElevenLabs for TTS).

Tool Calling

Define tools with JSON Schema. The model calls them, you execute them, and the conversation continues seamlessly.

Playbooks

Build multi-stage conversations with per-stage prompts, tools, and automatic transitions. Two-phase execution separates tool work from responses. Six transition types (tool calls, intents, keywords, LLM decision, timeouts, custom) give precise control over conversation flow.

Streaming Pipeline

Responses start playing before generation completes. Sentence-boundary detection ensures TTS starts at natural pause points, reducing perceived latency. STT → LLM → TTS streams end-to-end.

Hooks & Observability

20+ hook points for logging, debugging, and custom behavior. Built-in metrics track TTFT, token counts, and durations. Plug into your existing monitoring stack.

Session Resilience

Automatic reconnection with exponential backoff. Conversation history survives network interruptions. Graceful degradation when providers fail.

Architecture

LLMRTC consists of three packages:

Package	Purpose
`@llmrtc/llmrtc-core`	Types, orchestrators, tools, hooks—shared foundation
`@llmrtc/llmrtc-backend`	Node.js server with WebRTC, VAD, and all providers
`@llmrtc/llmrtc-web-client`	Browser SDK for audio/video capture and playback

Supported Providers

Cloud Providers

Provider	LLM	STT	TTS	Vision
OpenAI	GPT-4o, GPT-4	Whisper	TTS-1, TTS-1-HD	GPT-4o
Anthropic	Claude 3.5, Claude 3	-	-	Claude 3
Google Gemini	Gemini 1.5, Gemini Pro	-	-	Gemini Vision
AWS Bedrock	Claude, Llama, etc.	-	-	varies
OpenRouter	100+ models	-	-	varies
ElevenLabs	-	-	Multilingual v2	-

Local Providers

Provider	LLM	STT	TTS	Vision
Ollama	Llama, Mistral, etc.	-	-	LLaVA
LM Studio	Any GGUF model	-	-	-
Faster-Whisper	-	Whisper (fast)	-	-
Piper	-	-	Many voices	-

Use Cases

Voice Assistants

Build Siri/Alexa-style assistants with custom capabilities. Add tools for your domain—check orders, book appointments, control devices.

Customer Support

Multi-stage playbooks guide conversations through authentication, triage, and resolution. Tools integrate with your CRM and ticketing systems.

Multimodal Agents

Combine voice with vision for screen-aware assistants. Users can share their screen or camera and ask questions about what they see.

On-Device AI

Run entirely locally with Ollama, Faster-Whisper, and Piper. No cloud dependencies, no API costs, full privacy.

Developer Experience

TypeScript-First: Full type safety with IntelliSense support across all APIs
Tool Validation: JSON Schema validation catches malformed LLM arguments before execution
Smart Error Handling: Automatic retry with error classification (retryable vs non-retryable)
Comprehensive Types: Every provider, hook, and event is fully typed

Production Deployment

For production use, WebRTC requires a TURN server to ensure reliable connections for users behind NAT/firewalls.

Recommended: The OpenRelay Project by Metered provides a free global TURN server network with 20GB of monthly TURN usage at no cost — sufficient for most applications.

const server = new LLMRTCServer({
  providers: { llm, stt, tts },
  metered: {
    appName: 'your-app-name',
    apiKey: 'your-api-key'
  }
});

See Networking & TURN for detailed configuration options.

Quick Example

Backend (Node.js):

import { LLMRTCServer, OpenAILLMProvider, OpenAIWhisperProvider, OpenAITTSProvider } from '@llmrtc/llmrtc-backend';

const server = new LLMRTCServer({
  providers: {
    llm: new OpenAILLMProvider({ apiKey: process.env.OPENAI_API_KEY! }),
    stt: new OpenAIWhisperProvider({ apiKey: process.env.OPENAI_API_KEY! }),
    tts: new OpenAITTSProvider({ apiKey: process.env.OPENAI_API_KEY! })
  },
  systemPrompt: 'You are a helpful voice assistant.'
});

await server.start();

Frontend (Browser):

import { LLMRTCWebClient } from '@llmrtc/llmrtc-web-client';

const client = new LLMRTCWebClient({
  signallingUrl: 'ws://localhost:8787'
});

client.on('transcript', (text) => console.log('User:', text));
client.on('llmChunk', (chunk) => console.log('Assistant:', chunk));

await client.start();
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
await client.shareAudio(stream);

Getting Started

Ready to build? Follow our quickstart guides:

Installation - Set up packages and dependencies
Backend Quickstart - Run your first server
Web Client Quickstart - Connect from the browser
Tool Calling - Add custom capabilities
Local-Only Stack - Run without cloud APIs

Documentation Structure

Section	Contents
Getting Started	Installation, quickstarts, first application
Concepts	Architecture, streaming, VAD, playbooks, tools
Backend	Server configuration, deployment, security
Web Client	Browser SDK, audio/video, UI patterns
Playbooks	Multi-stage conversations, text and voice agents
Providers	Provider-specific configuration and features
Recipes	Complete examples for common use cases
Operations	Monitoring, troubleshooting, scaling
Protocol	Wire protocol for custom clients

Community

GitHub: github.com/llmrtc/llmrtc
Issues: Report bugs and request features
Email: contact@llmrtc.org

Next Steps

Getting Started

Build your first voice assistant in minutes

Architecture

Understand how LLMRTC works

What is LLMRTC?​

Key Features​

Real-Time Voice​

Vision Support​

Provider Agnostic​

Tool Calling​

Playbooks​

Streaming Pipeline​

Hooks & Observability​

Session Resilience​

Architecture​

Supported Providers​

Cloud Providers​

Local Providers​

Use Cases​

Voice Assistants​

Customer Support​

Multimodal Agents​

On-Device AI​

Developer Experience​

Production Deployment​

Quick Example​

Getting Started​

Documentation Structure​

Community​

Next Steps​