Implementation of speech recognition in multiple languages (Multilingual STT) Multilingual STT is needed where the same service processes speech in different languages: international call centers, global platforms, systems operating in several countries. ### Multilingual recognition strategies 1. One multilingual engine — Whisper supports 99 languages in a single model:
from faster_whisper import WhisperModel
model = WhisperModel("large-v3", device="cuda")
# Автоопределение языка
segments, info = model.transcribe(audio, language=None)
detected_lang = info.language # ISO 639-1 код
print(f"Detected: {detected_lang} ({info.language_probability:.2f})")
```**2. Language-specific models** - separate models for each language, better quality in each:```python
models = {
"ru": WhisperModel("large-v3", device="cuda"),
"en": WhisperModel("large-v3", device="cuda"),
"de": WhisperModel("large-v3", device="cuda"),
}
def transcribe_multilingual(audio, lang: str = None):
if lang is None:
lang = detect_language(audio)
return models.get(lang, models["en"]).transcribe(audio, language=lang)
```**3. Hybrid**: we detect the language using a fast model (Whisper tiny or langid), then pass it to a specialized one. ### WER by language (Whisper large-v3) | Language | WER (clean audio) | |------|-------------------| | English | 2.7% | | German | 3.8% | | French | 4.2% | | Spanish | 3.1% | | Russian | 7–10% | | Arabic | 8–12% | | Chinese | 5–7% |
### Code-switching If one entry contains mixed languages (Russian with
English technical terms):
```python
# Whisper справляется со code-switching автоматически
# Для явного управления:
segments, _ = model.transcribe(
audio,
language=None, # автоопределение
task="transcribe", # не "translate"
condition_on_previous_text=True
)
```Timeframe: Integration with auto-detection — 1–2 days. Multilingual system with routing — 1 week.







