OpenAI TTS Integration for Speech Synthesis. The OpenAI TTS API offers 6 voices (alloy, echo, fable, onyx, nova, shimmer) with support for 50+ languages. English quality is the best among cloud solutions. Russian is good, with natural intonation, but sometimes with a noticeable accent. ### Available models - tts-1: optimized for speed, latency ~300 ms - tts-1-hd: high quality, latency ~500–800 ms ### Basic Integration
from openai import OpenAI
import io
client = OpenAI()
def synthesize_speech(text: str, voice: str = "alloy") -> bytes:
response = client.audio.speech.create(
model="tts-1-hd",
voice=voice, # alloy | echo | fable | onyx | nova | shimmer
input=text,
response_format="mp3", # mp3 | opus | aac | flac | wav | pcm
speed=1.0 # 0.25–4.0
)
return response.content
# Потоковый вывод (для real-time воспроизведения)
with client.audio.speech.with_streaming_response.create(
model="tts-1",
voice="nova",
input="Привет! Как я могу вам помочь?",
) as response:
response.stream_to_file("output.mp3")
```### Caching responses TTS requests for identical text return the same audio - we cache:```python
import hashlib
import redis
cache = redis.Redis()
def get_speech(text: str, voice: str = "alloy") -> bytes:
cache_key = hashlib.md5(f"{text}:{voice}:tts-1-hd".encode()).hexdigest()
cached = cache.get(cache_key)
if cached:
return cached
audio = synthesize_speech(text, voice)
cache.setex(cache_key, 86400 * 7, audio) # TTL 7 дней
return audio
```### Cost of tts-1: $15/1M characters. tts-1-hd: $30/1M characters. For a typical 100-character phrase: $0.0015 / $0.003. Integration: 1 day.







