Skip to main content

Welcome to LLMRTC

LLMRTC is a TypeScript SDK for building real-time voice and vision AI applications. It combines WebRTC for low-latency audio/video streaming with LLMs, speech-to-text, and text-to-speech—all through a unified, provider-agnostic API.


What is LLMRTC?

LLMRTC handles the complex infrastructure needed for conversational AI:

You focus on your application logic. LLMRTC handles:

  • Real-time audio/video streaming via WebRTC
  • Voice activity detection and barge-in
  • Provider orchestration and streaming pipelines
  • Session management and reconnection

Key Features

Real-Time Voice

Stream audio bidirectionally with sub-second latency. Server-side VAD detects speech boundaries, and barge-in lets users interrupt the assistant naturally.

Vision Support

Send camera frames or screen captures alongside speech. Vision-capable models can see what users see.

Provider Agnostic

Switch between OpenAI, Anthropic, Google Gemini, AWS Bedrock, or local models without changing your code. Mix providers freely (e.g., Claude for LLM, Whisper for STT, ElevenLabs for TTS).

Tool Calling

Define tools with JSON Schema. The model calls them, you execute them, and the conversation continues seamlessly.

Playbooks

Build multi-stage conversations with per-stage prompts, tools, and automatic transitions. Two-phase execution separates tool work from responses. Six transition types (tool calls, intents, keywords, LLM decision, timeouts, custom) give precise control over conversation flow.

Streaming Pipeline

Responses start playing before generation completes. Sentence-boundary detection ensures TTS starts at natural pause points, reducing perceived latency. STT → LLM → TTS streams end-to-end.

Hooks & Observability

20+ hook points for logging, debugging, and custom behavior. Built-in metrics track TTFT, token counts, and durations. Plug into your existing monitoring stack.

Session Resilience

Automatic reconnection with exponential backoff. Conversation history survives network interruptions. Graceful degradation when providers fail.


Architecture

LLMRTC consists of three packages:

PackagePurpose
@llmrtc/llmrtc-coreTypes, orchestrators, tools, hooks—shared foundation
@llmrtc/llmrtc-backendNode.js server with WebRTC, VAD, and all providers
@llmrtc/llmrtc-web-clientBrowser SDK for audio/video capture and playback

Supported Providers

Cloud Providers

ProviderLLMSTTTTSVision
OpenAIGPT-4o, GPT-4WhisperTTS-1, TTS-1-HDGPT-4o
AnthropicClaude 3.5, Claude 3--Claude 3
Google GeminiGemini 1.5, Gemini Pro--Gemini Vision
AWS BedrockClaude, Llama, etc.--varies
OpenRouter100+ models--varies
ElevenLabs--Multilingual v2-

Local Providers

ProviderLLMSTTTTSVision
OllamaLlama, Mistral, etc.--LLaVA
LM StudioAny GGUF model---
Faster-Whisper-Whisper (fast)--
Piper--Many voices-

Use Cases

Voice Assistants

Build Siri/Alexa-style assistants with custom capabilities. Add tools for your domain—check orders, book appointments, control devices.

Customer Support

Multi-stage playbooks guide conversations through authentication, triage, and resolution. Tools integrate with your CRM and ticketing systems.

Multimodal Agents

Combine voice with vision for screen-aware assistants. Users can share their screen or camera and ask questions about what they see.

On-Device AI

Run entirely locally with Ollama, Faster-Whisper, and Piper. No cloud dependencies, no API costs, full privacy.


Developer Experience

  • TypeScript-First: Full type safety with IntelliSense support across all APIs
  • Tool Validation: JSON Schema validation catches malformed LLM arguments before execution
  • Smart Error Handling: Automatic retry with error classification (retryable vs non-retryable)
  • Comprehensive Types: Every provider, hook, and event is fully typed

Production Deployment

For production use, WebRTC requires a TURN server to ensure reliable connections for users behind NAT/firewalls.

Recommended: The OpenRelay Project by Metered provides a free global TURN server network with 20GB of monthly TURN usage at no cost — sufficient for most applications.

const server = new LLMRTCServer({
providers: { llm, stt, tts },
metered: {
appName: 'your-app-name',
apiKey: 'your-api-key'
}
});

See Networking & TURN for detailed configuration options.


Quick Example

Backend (Node.js):

import { LLMRTCServer, OpenAILLMProvider, OpenAIWhisperProvider, OpenAITTSProvider } from '@llmrtc/llmrtc-backend';

const server = new LLMRTCServer({
providers: {
llm: new OpenAILLMProvider({ apiKey: process.env.OPENAI_API_KEY! }),
stt: new OpenAIWhisperProvider({ apiKey: process.env.OPENAI_API_KEY! }),
tts: new OpenAITTSProvider({ apiKey: process.env.OPENAI_API_KEY! })
},
systemPrompt: 'You are a helpful voice assistant.'
});

await server.start();

Frontend (Browser):

import { LLMRTCWebClient } from '@llmrtc/llmrtc-web-client';

const client = new LLMRTCWebClient({
signallingUrl: 'ws://localhost:8787'
});

client.on('transcript', (text) => console.log('User:', text));
client.on('llmChunk', (chunk) => console.log('Assistant:', chunk));

await client.start();
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
await client.shareAudio(stream);

Getting Started

Ready to build? Follow our quickstart guides:

  1. Installation - Set up packages and dependencies
  2. Backend Quickstart - Run your first server
  3. Web Client Quickstart - Connect from the browser
  4. Tool Calling - Add custom capabilities
  5. Local-Only Stack - Run without cloud APIs

Documentation Structure

SectionContents
Getting StartedInstallation, quickstarts, first application
ConceptsArchitecture, streaming, VAD, playbooks, tools
BackendServer configuration, deployment, security
Web ClientBrowser SDK, audio/video, UI patterns
PlaybooksMulti-stage conversations, text and voice agents
ProvidersProvider-specific configuration and features
RecipesComplete examples for common use cases
OperationsMonitoring, troubleshooting, scaling
ProtocolWire protocol for custom clients

Community


Next Steps