Speech synthesis implementation with voice and timbre selection. Voice and timbre selection is the user interface over the TTS system. Different voices for different contexts: official for banking, friendly for retail, neutral for IVR. ### Voice catalog and mapping
from dataclasses import dataclass
from enum import Enum
class VoiceGender(Enum):
MALE = "male"
FEMALE = "female"
@dataclass
class VoiceProfile:
id: str
name: str
gender: VoiceGender
language: str
provider: str
style: str # formal | friendly | neutral | energetic
sample_url: str
VOICE_CATALOG = [
VoiceProfile("alena", "Алёна", VoiceGender.FEMALE, "ru", "yandex",
"friendly", "/samples/alena.mp3"),
VoiceProfile("filipp", "Филипп", VoiceGender.MALE, "ru", "yandex",
"neutral", "/samples/filipp.mp3"),
VoiceProfile("sv-svetlana", "Светлана", VoiceGender.FEMALE, "ru", "azure",
"formal", "/samples/svetlana.mp3"),
VoiceProfile("alloy", "Alloy", VoiceGender.MALE, "en", "openai",
"neutral", "/samples/alloy.mp3"),
]
def select_voice(gender: VoiceGender, language: str,
style: str = "neutral") -> VoiceProfile:
candidates = [v for v in VOICE_CATALOG
if v.gender == gender and v.language == language
and v.style == style]
return candidates[0] if candidates else VOICE_CATALOG[0]
```### Timbre parameters (prosody)```python
@dataclass
class VoiceSettings:
rate: float = 1.0 # скорость: 0.5–2.0
pitch: float = 0.0 # тональность: -20 до +20 полутонов
volume: float = 1.0 # громкость: 0.0–2.0
def apply_voice_settings(text: str, settings: VoiceSettings) -> str:
"""Оборачиваем текст в SSML с параметрами тембра"""
rate_map = {0.5: "x-slow", 0.75: "slow", 1.0: "medium",
1.25: "fast", 1.5: "x-fast"}
rate_str = f"{int(settings.rate * 100)}%"
pitch_str = f"{settings.pitch:+.0f}st"
return f"""<speak>
<prosody rate="{rate_str}" pitch="{pitch_str}">
{text}
</prosody>
</speak>"""
```### Voice A/B Testing To optimize your voice brand based on satisfaction metrics:```python
import random
def get_voice_for_user(user_id: str, test_name: str) -> str:
# Детерминированное распределение по user_id
hash_val = hash(f"{user_id}:{test_name}") % 100
if hash_val < 50:
return "alena" # control
else:
return "filipp" # variant
```Timeframe: Votes catalog with selection UI – 2–3 days. Full system with A/B and analytics – 1 week.