CURSOR PLAN MODE PROMPT — REALTIME PHONE SYSTEM (Twilio ↔ FastAPI ↔ OpenAI Realtime ↔ Supabase) Goal: Create a production-ready real-time AI phone system that connects Twilio Media Streams ↔ FastAPI WebSocket ↔ OpenAI Realtime API ↔ Supabase, with audio streaming, RAG-enhanced instructions, transcript collection, and full call logging. ============================== HIGH-LEVEL REQUIREMENTS ============================== Implement a backend service with: 1. API Endpoints - POST /api/v1/incoming-call-realtime — Twilio webhook returning TwiML that instructs Twilio to connect the call to a WebSocket stream. - WS /api/v1/media-stream — WebSocket bridge: - Accepts Twilio Media Streams WebSocket connection - Connects to OpenAI Realtime API - Converts audio formats (μ-law ↔ PCM16) - Streams audio bidirectionally - Collects transcripts - Runs RAG - Saves call + transcript to Supabase ============================== AUDIO PROCESSING ============================== Implement audio conversion utilities: Twilio → OpenAI: - μ-law 8kHz → PCM16 8kHz → PCM16 24kHz - base64 decode/encode - use audioop-lts imported as audioop OpenAI → Twilio: - PCM16 24kHz → PCM16 8kHz → μ-law 8kHz ============================== REALTIME EVENT HANDLING ============================== Twilio Events: - connected - start (contains customParameters: From, To) - media (audio chunks) - stop OpenAI Events: - response.audio.delta - response.audio_transcript.delta - response.audio_transcript.done - conversation.item.input_audio_transcription.completed ============================== RAG IMPLEMENTATION ============================== Supabase tables required: - calls - call_transcripts - user_settings - agent_prompts - knowledge_base (with pgvector embedding column) Supabase RPC required: match_knowledge_chunks(query_embedding vector, match_user_id uuid, match_count int) RAG steps: 1. Resolve user_id from Twilio "To" number using user_settings. 2. Fetch custom agent prompt from agent_prompts. 3. Generate embedding using text-embedding-3-small. 4. Query knowledge base using RPC. 5. Apply similarity threshold and chunk limit (env-configurable). 6. Assemble RAG-enhanced instructions. 7. Send session.update to OpenAI before streaming begins. ============================== TRANSCRIPT COLLECTION ============================== Implement TranscriptCollector that: - Stores user speech - Stores AI speech - Builds final transcript on call end - Generates AI summary (using OpenAI completion) - Saves calls row - Saves call_transcripts row ============================== ENVIRONMENT VARIABLES ============================== Create .env support with: TWILIO_ACCOUNT_SID= TWILIO_AUTH_TOKEN= OPENAI_API_KEY= LEMONFOX_API_KEY= SUPABASE_URL= SUPABASE_SERVICE_ROLE_KEY= RAG_CHUNK_LIMIT=5 RAG_SIMILARITY_THRESHOLD=0.7 LOG_LEVEL=INFO ============================== PROJECT STRUCTURE (CREATE EXACTLY THIS) ============================== app/ main.py api/ v1/ realtime_stream.py calls.py services/ knowledge_base_service.py repositories/ knowledge_base_repo.py conversations_repo.py core/ config.py logging_config.py ============================== IMPLEMENTATION INSTRUCTIONS FOR CURSOR ============================== Build the entire project end-to-end, with: Frameworks: - FastAPI - websockets (client) - audioop-lts - openai (latest official) - supabase-py - python-dotenv - uvicorn Behavior: - Fully implement both endpoints - Fully implement WebSocket bridge - Fully implement audio conversion - Fully implement RAG pipeline - Fully implement transcript saving - Fully implement logging Code Quality: - Type hinted - Modular - Async everywhere - Robust error handling - Clear logging - No unused code - Production-ready structure ============================== WHEN BUILDING CODE ============================== Cursor should: - Generate missing files - Update imports automatically - Handle all async operations - Create helper classes for clarity - Follow the functional requirements exactly - Build the system as described in the long specification ============================== FINAL ASSETS TO DELIVER ============================== Cursor should output: - Complete FastAPI backend - Complete audio processing utilities - Complete WebSocket bridge - Complete RAG pipeline - Complete transcript collector - Complete Supabase integration - Complete Twilio webhook handler - Ready-to-run system using uvicorn - Fully working local dev workflow with ngrok ============================== START NOW ============================== Begin by generating the full project structure and boilerplate code, then proceed to implement each file in detail. Ensure the final output is a fully working system matching the design above. If you need more context, request clarification—otherwise, proceed with generation.