Implementation of prosodic speech control (speed, pitch, volume) Prosody—rhythm, tempo, intonation, and volume of speech. Precise prosody control allows for context-specific synthesis: slower for numerical data, louder for warnings, higher pitch for questions. ### SSML prosody control
<speak version='1.0' xmlns='http://www.w3.org/2001/10/synthesis' xml:lang='ru-RU'>
<!-- Скорость: x-slow, slow, medium, fast, x-fast, или % -->
<prosody rate="slow">
Номер вашего заказа: А-один-два-три-четыре.
</prosody>
<!-- Тональность: x-low, low, medium, high, x-high, или ±st -->
<prosody pitch="+2st">
Это хорошая новость!
</prosody>
<!-- Громкость: silent, x-soft, soft, medium, loud, x-loud, или дБ -->
<prosody volume="loud">
Внимание!
</prosody>
<!-- Комбинированное управление -->
<prosody rate="90%" pitch="-1st" volume="-3dB">
Подождите, пожалуйста, один момент.
</prosody>
</speak>
```### Dynamic management at runtime```python
from dataclasses import dataclass
@dataclass
class ProsodyProfile:
rate: str = "medium" # x-slow | slow | medium | fast | x-fast | 80%
pitch: str = "medium" # x-low | low | medium | high | x-high | +2st
volume: str = "medium" # silent | x-soft | soft | medium | loud | x-loud
PROFILES = {
"numbers": ProsodyProfile(rate="slow", pitch="medium"),
"warning": ProsodyProfile(rate="medium", pitch="+2st", volume="loud"),
"farewell": ProsodyProfile(rate="slow", pitch="-1st"),
"question": ProsodyProfile(pitch="+1st"),
}
def wrap_with_prosody(text: str, profile: ProsodyProfile) -> str:
return f"""<prosody rate="{profile.rate}" pitch="{profile.pitch}"
volume="{profile.volume}">{text}</prosody>"""
```### Contextual management via NLP```python
def detect_prosody_context(text: str) -> ProsodyProfile:
"""Автоматически определяем нужную просодику"""
if text.endswith("?"):
return PROFILES["question"]
if any(w in text.lower() for w in ["внимание", "важно", "срочно"]):
return PROFILES["warning"]
if any(char.isdigit() for char in text):
return PROFILES["numbers"]
return ProsodyProfile() # default
```### Provider Restrictions - **Google TTS**: Full Support `<prosody> ` via SSML - **Azure**: `rate` in the range of 0.5–2.0, `pitch` ±50% - **OpenAI TTS**: only the `speed` parameter (0.25–4.0), without SSML - **Yandex SpeechKit**: the `speed` parameter via API, limited by SSML Timeframe: basic prosody management — 1–2 days. Contextual automatic routing — 3–4 days.