Twilio Voice AI Integration for Phone AI Bots

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
Twilio Voice AI Integration for Phone AI Bots
Medium
from 1 week to 3 months
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822

Twilio Voice AI Integration for AI Phone Bots Twilio is the leading cloud telephony solution for developers. The Media Streams API lets you receive live call audio via Web

Socket and send back synthesized speech—the foundation for any AI voice bot. ### Integration ArchitectureCaller → Twilio PSTN → TwiML → Media Streams WebSocket → Your AI Server ↓ STT → LLM → TTS ↓ Synthesized Audio → Twilio → Caller### TwiML webhook for incoming calls```python from fastapi import FastAPI, Request from twilio.twiml.voice_response import VoiceResponse, Start, Stream, Say

app = FastAPI()

@app.post("/incoming-call") async def handle_incoming_call(request: Request): response = VoiceResponse()

# Запускаем Media Stream
start = Start()
start.stream(
    url=f"wss://api.yourapp.com/stream",
    track="both_tracks"  # входящее и исходящее аудио
)
response.append(start)

# Произносим приветствие
response.say(
    "Здравствуйте! Я голосовой ассистент. Как могу помочь?",
    voice="alice",
    language="ru-RU"
)
response.pause(length=30)
return Response(content=str(response), media_type="text/xml")

### WebSocket Media Streams Handlerpython import asyncio import json import base64 from fastapi import WebSocket

@app.websocket("/stream") async def handle_stream(websocket: WebSocket): await websocket.accept() call_sid = None stream_sid = None audio_buffer = bytearray()

try:
    async for message in websocket.iter_text():
        data = json.loads(message)
        event = data.get("event")

        if event == "start":
            call_sid = data["start"]["callSid"]
            stream_sid = data["start"]["streamSid"]
            session = create_session(call_sid)

        elif event == "media":
            # Twilio использует mulaw 8kHz
            mulaw_audio = base64.b64decode(data["media"]["payload"])
            audio_buffer.extend(mulaw_audio)

            # Обрабатываем когда накопили 2 секунды (16000 bytes @ 8kHz)
            if len(audio_buffer) >= 16000:
                await process_audio_chunk(
                    bytes(audio_buffer), websocket, stream_sid, session
                )
                audio_buffer = bytearray()

        elif event == "stop":
            break

except Exception as e:
    logger.error(f"Stream error: {e}")

async def send_audio_to_caller(websocket: WebSocket, stream_sid: str, audio_bytes: bytes): """Отправляем синтезированное аудио обратно в звонок""" encoded = base64.b64encode(audio_bytes).decode() await websocket.send_json({ "event": "media", "streamSid": stream_sid, "media": { "payload": encoded } }) ### Twilio audio format conversion uses μ-law (mulaw) 8kHz. Whisper uses PCM 16kHz:python import audioop

def mulaw_to_pcm16k(mulaw_bytes: bytes) -> bytes: """μ-law 8kHz → PCM 16-bit 8kHz → upsample to 16kHz""" pcm_8k = audioop.ulaw2lin(mulaw_bytes, 2) # μ-law → PCM 16-bit pcm_16k, _ = audioop.ratecv(pcm_8k, 2, 1, 8000, 16000, None) # 8→16kHz return pcm_16k