How to Build a Real-Time AI Phone System
Step-by-step tutorial for building an AI voicemail using Twilio Media Streams, FastAPI, OpenAI Realtime API, and Supabase
Why AI Voicemail Systems Matter
In today’s fast-paced world, our phones are constantly busy with customer inquiries, urgent requests, and follow-ups. Traditionally, businesses relied on call centers for large enterprises, receptionists for mid-sized companies, and personal assistants for busy individuals, but AI-powered phone services now offer a reliable and cost-effective alternative. These systems can handle high call volumes, provide immediate customer service, and operate 24/7, ensuring no call goes unanswered even outside office hours or while traveling internationally.
Use Cases:
- AI front desk
- Customer support
- Sales qualification
- Automated voicemail with transcript and summary
High-Level Architecture Overview
The system consists of four main components:
- Twilio Media Streams – Streams live call audio to your server.
- FastAPI WebSocket Bridge – Connects Twilio ↔ OpenAI and handles audio conversion.
- OpenAI Realtime API – Processes live AI conversation and generates responses.
- Supabase – Stores call transcripts, AI summaries, and voicemail data.
Flow Diagram:
Phone Call → Twilio → Webhook → WebSocket → Audio Bridge → OpenAI Realtime API
↓
Supabase (DB)
Key Components and Technologies
| Component | Purpose |
|---|---|
| Twilio Media Streams | Real-time call streaming |
| FastAPI | WebSocket server and bridge |
| OpenAI Realtime API | AI voice conversation and transcription |
| audioop-lts | Audio format conversion |
| Supabase | Database for transcripts and RAG knowledge base |
| RAG (Retrieval Augmented Generation) | Personalized AI instructions and context |
Prerequisites
-
Accounts & Services
- Twilio account + phone number
- OpenAI API key (Realtime API access)
- Supabase project with tables:
calls,call_transcripts,user_settings,knowledge_base
-
Python Environment
- Python 3.8+
- pip package manager
-
Development Tools
- ngrok for local testing
- Terminal/command line access
Step-by-Step Setup Instructions
Step 1: Install Dependencies
pip install fastapi uvicorn websockets audioop-lts openai supabase python-dotenv
Step 2: Configure Environment Variables
Create a .env file:
# Twilio
TWILIO_ACCOUNT_SID=your_account_sid
TWILIO_AUTH_TOKEN=your_auth_token
# OpenAI
OPENAI_API_KEY=sk-...
# Supabase
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SERVICE_ROLE_KEY=your_service_role_key
# RAG Settings
RAG_CHUNK_LIMIT=5
RAG_SIMILARITY_THRESHOLD=0.7
Step 3: Set Up Supabase Tables
Create the following tables:
- calls: call metadata
- call_transcripts: transcripts and summaries
- user_settings: phone number → user_id
- agent_prompts: custom prompts for AI
- knowledge_base: optional RAG chunks
Step 4: Create the Twilio Webhook
- Endpoint:
/api/v1/incoming-call-realtime - Returns TwiML connecting call to WebSocket
Step 5: Write the WebSocket Bridge
- Accepts Twilio WebSocket
- Connects to OpenAI Realtime API
- Handles media events:
connected,start,media,stop - Collects transcripts and forwards audio
RAG schema

Audio Format Conversion
- Twilio → OpenAI: μ-law 8kHz → PCM16 24kHz
- OpenAI → Twilio: PCM16 24kHz → μ-law 8kHz
- Conversion handled by
audioop-ltslibrary
Streaming Conversation Logic
- Twilio sends
mediaevents → converted → OpenAI - OpenAI sends
response.audio.delta→ converted → Twilio - AI transcription events collected in real-time
- Async tasks handle bidirectional streaming
Saving Transcripts to Supabase
- Collect conversation lines (user + AI)
- Generate AI summary
- Save to
callsandcall_transcriptstables
Testing the System
- Start FastAPI:
uvicorn app.main:app --reload --port 8000 - Expose with ngrok:
ngrok http 8000 - Update Twilio webhook to ngrok URL
- Make test calls, verify audio streaming, transcription, and RAG-enhanced responses
What You Can Build With This System
- AI Receptionist – Answer calls automatically
- Customer Support Bot – Live issue resolution
- Sales Qualification Agent – Collect leads
- AI Voicemail System – Automated greeting, recording, transcript, summary
- Multi-Tenant SaaS – Custom AI agents per business
- Internal Helpdesk – HR, IT support
- Workflow Automation – Trigger notifications or CRM actions
TL;DR
Either read this article or feed IDE of your choice with this context and let it run it for you. Download ready-to-use prompt
By following this guide, you can launch build an AI voice agent capable of handling calls, transcribing them, and generating voicemail summaries automatically. Happy vibing!
Ready to transform your voicemails?
Join forward-thinking businesses using Yadalog to engage customers intelligently.
