Implementation of Voice AI Agent (voice AI agent for calls) Voice AI Agent is an autonomous agent that conducts full-fledged telephone conversations: it understands the context, asks clarifying questions, makes decisions, calls tools (CRM, databases), and ends the conversation with the result. ### Voice AI Agent Architecture
Telephony (Twilio/Voximplant)
↓
WebSocket Bridge
↓
STT (Deepgram/Whisper)
↓
Conversation Manager
├── State Machine
├── LLM (GPT-4o)
├── Tool Registry (CRM, DB, APIs)
└── Context Window
↓
TTS (ElevenLabs/OpenAI)
↓
Audio Back to Call
```### Conversation Manager with tools```python
from openai import AsyncOpenAI
from dataclasses import dataclass, field
import json
client = AsyncOpenAI()
@dataclass
class AgentState:
call_id: str
history: list = field(default_factory=list)
collected_data: dict = field(default_factory=dict)
current_intent: str = None
class VoiceAgent:
def __init__(self):
self.tools = [
{
"type": "function",
"function": {
"name": "lookup_order",
"description": "Найти заказ клиента по номеру телефона или ID заказа",
"parameters": {
"type": "object",
"properties": {
"phone": {"type": "string"},
"order_id": {"type": "string"}
}
}
}
},
{
"type": "function",
"function": {
"name": "reschedule_delivery",
"description": "Перенести доставку на другую дату",
"parameters": {
"type": "object",
"properties": {
"order_id": {"type": "string"},
"new_date": {"type": "string", "description": "YYYY-MM-DD"}
},
"required": ["order_id", "new_date"]
}
}
}
]
async def process_turn(self, state: AgentState, user_text: str) -> str:
state.history.append({"role": "user", "content": user_text})
response = await client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": self._get_system_prompt()},
*state.history
],
tools=self.tools,
tool_choice="auto"
)
message = response.choices[0].message
# Обработка function calls
if message.tool_calls:
tool_results = await self._execute_tools(message.tool_calls)
state.history.append(message)
state.history.extend(tool_results)
# Повторный вызов для финального ответа
final = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "system", "content": self._get_system_prompt()}]
+ state.history
)
reply = final.choices[0].message.content
else:
reply = message.content
state.history.append({"role": "assistant", "content": reply})
return reply
```### Twilio integration```python
from twilio.rest import Client
from twilio.twiml.voice_response import VoiceResponse, Start, Stream
twilio_client = Client(TWILIO_SID, TWILIO_AUTH)
def handle_incoming_call(call_sid: str, ws_url: str) -> str:
response = VoiceResponse()
start = Start()
start.stream(url=f'wss://api.example.com/stream/{call_sid}')
response.append(start)
response.say("Добро пожаловать! Как я могу помочь?",
voice="alice", language="ru-RU")
response.pause(length=60)
return str(response)
```### Agent Quality Metrics - **Task Completion Rate** (TCR): % of calls with resolved task - **Containment Rate**: % of calls without transfer to an agent - **Average Handle Time** (AHT) - **False Transfer Rate**: % of incorrect transfers Target metrics: TCR >70%, Containment >60%. Timeframe: MVP agent with basic scenarios — 3–4 weeks. Production system with monitoring — 2–3 months.