Why AI Voicemail Systems Matter
In today’s fast-paced world, our phones are constantly busy with customer inquiries, urgent requests, and follow-ups. Traditionally, businesses relied on call centers for large enterprises, receptionists for mid-sized companies, and personal assistants for busy individuals, but AI-powered phone services now offer a reliable and cost-effective alternative. These systems can handle high call volumes, provide immediate customer service, and operate 24/7, ensuring no call goes unanswered even outside office hours or while traveling internationally.
Use Cases:
- AI front desk
- Customer support
- Sales qualification
- Automated voicemail with transcript and summary
High-Level Architecture Overview
The system consists of four main components:
- Twilio Media Streams – Streams live call audio to your server.
- FastAPI WebSocket Bridge – Connects Twilio ↔ OpenAI and handles audio conversion.
- OpenAI Realtime API – Processes live AI conversation and generates responses.
- Supabase – Stores call transcripts, AI summaries, and voicemail data.
Flow Diagram:
Phone Call → Twilio → Webhook → WebSocket → Audio Bridge → OpenAI Realtime API
↓
Supabase (DB)
Key Components and Technologies
| Component | Purpose |
|---|---|
| Twilio Media Streams | Real-time call streaming |
| FastAPI | WebSocket server and bridge |
| OpenAI Realtime API | AI voice conversation and transcription |
| audioop-lts | Audio format conversion |
| Supabase | Database for transcripts and RAG knowledge base |
| RAG (Retrieval Augmented Generation) | Personalized AI instructions and context |
Prerequisites
-
Accounts & Services
- Twilio account + phone number
- OpenAI API key (Realtime API access)
- Supabase project with tables:
calls,call_transcripts,user_settings,knowledge_base
-
Python Environment
- Python 3.8+
- pip package manager
-
Development Tools
- ngrok for local testing
- Terminal/command line access
Step-by-Step Setup Instructions
Step 1: Install Dependencies
pip install fastapi uvicorn websockets audioop-lts openai supabase python-dotenv
Step 2: Configure Environment Variables
Create a .env file:
# Twilio
TWILIO_ACCOUNT_SID=your_account_sid
TWILIO_AUTH_TOKEN=your_auth_token
# OpenAI
OPENAI_API_KEY=sk-...
# Supabase
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SERVICE_ROLE_KEY=your_service_role_key
# RAG Settings
RAG_CHUNK_LIMIT=5
RAG_SIMILARITY_THRESHOLD=0.7
Step 3: Set Up Supabase Tables
Create the following tables:
- calls: call metadata
- call_transcripts: transcripts and summaries
- user_settings: phone number → user_id
- agent_prompts: custom prompts for AI
- knowledge_base: optional RAG chunks
Step 4: Create the Twilio Webhook
- Endpoint:
/api/v1/incoming-call-realtime - Returns TwiML connecting call to WebSocket
Step 5: Write the WebSocket Bridge
- Accepts Twilio WebSocket
- Connects to OpenAI Realtime API
- Handles media events:
connected,start,media,stop - Collects transcripts and forwards audio
RAG schema

Audio Format Conversion
- Twilio → OpenAI: μ-law 8kHz → PCM16 24kHz
- OpenAI → Twilio: PCM16 24kHz → μ-law 8kHz
- Conversion handled by
audioop-ltslibrary
Streaming Conversation Logic
- Twilio sends
mediaevents → converted → OpenAI - OpenAI sends
response.audio.delta→ converted → Twilio - AI transcription events collected in real-time
- Async tasks handle bidirectional streaming
Saving Transcripts to Supabase
- Collect conversation lines (user + AI)
- Generate AI summary
- Save to
callsandcall_transcriptstables
Testing the System
- Start FastAPI:
uvicorn app.main:app --reload --port 8000 - Expose with ngrok:
ngrok http 8000 - Update Twilio webhook to ngrok URL
- Make test calls, verify audio streaming, transcription, and RAG-enhanced responses
What You Can Build With This System
- AI Receptionist – Answer calls automatically
- Customer Support Bot – Live issue resolution
- Sales Qualification Agent – Collect leads
- AI Voicemail System – Automated greeting, recording, transcript, summary
- Multi-Tenant SaaS – Custom AI agents per business
- Internal Helpdesk – HR, IT support
- Workflow Automation – Trigger notifications or CRM actions
TL;DR
Either read this article or feed IDE of your choice with this context and let it run it for you. Download ready-to-use prompt
By following this guide, you can launch build an AI voice agent capable of handling calls, transcribing them, and generating voicemail summaries automatically. Happy vibing!



