Coqui TTS Integration for Speech Synthesis (Open Source) Coqui TTS is a library with a set of pre-trained neural TTS models: VITS, YourTTS, XTTS. It's an open-source alternative to cloud services for tasks with data privacy requirements. Supports Russian. ### Installation and Available Models
pip install TTS
# Список доступных моделей
tts --list_models
```### XTTS v2 - Multilingual Model with Cloning```python
from TTS.api import TTS
# Инициализация XTTS v2
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to("cuda")
# Синтез на русском
tts.tts_to_file(
text="Привет! Это пример синтеза речи на русском языке.",
speaker_wav="reference_speaker.wav", # референсный голос (3–10 сек)
language="ru",
file_path="output.wav"
)
# Потоковый синтез (chunks)
for chunk in tts.tts_with_vc_streaming(
text="Длинный текст для потокового синтеза",
speaker_wav="reference.wav",
language="ru"
):
# обрабатываем chunk аудио
pass
```### VITS — a fast model for Russians```python
tts = TTS("tts_models/ru/cv/vits") # русская VITS модель
tts.tts_to_file(
text="Привет мир",
file_path="output.wav"
)
```### Performance | Model | GPU | Speed | Quality | |--------|-----|---------| | XTTS v2 | RTX 3080 | ~2x RT | Excellent | | VITS (ru) | RTX 3080 | ~15x RT | Good | | YourTTS | RTX 3080 | ~5x RT | Good | ### FastAPI wrapper for production```python
from fastapi import FastAPI
from TTS.api import TTS
import io, soundfile as sf
app = FastAPI()
tts_model = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to("cuda")
@app.post("/tts")
async def text_to_speech(text: str, language: str = "ru"):
wav = tts_model.tts(text=text, language=language,
speaker_wav="default_speaker.wav")
buf = io.BytesIO()
sf.write(buf, wav, 24000, format="WAV")
buf.seek(0)
return StreamingResponse(buf, media_type="audio/wav")
```Timeline: Basic integration – 2–3 days. Production API with voice control – 1 week.







