Implementation of Automatic Language Detection Automatic language detection is a mandatory component of multilingual systems. It allows audio to be routed to the correct STT model or operator without manually specifying the language. ### Language detection approaches Whisper-based – the most accurate, uses the first 30 seconds of audio:
from faster_whisper import WhisperModel
model = WhisperModel("small", device="cuda") # small достаточно для LID
def detect_language(audio_path: str) -> tuple[str, float]:
_, info = model.transcribe(audio_path, language=None, task="transcribe")
return info.language, info.language_probability
```**langid / langdetect** — faster, but works with text (requires a rough STT). **Lightweight audio-based classifiers**:```python
# speechbrain — специализированная LID-модель
from speechbrain.pretrained import EncoderClassifier
classifier = EncoderClassifier.from_hparams(
source="speechbrain/lang-id-voxlingua107-ecapa",
savedir="tmp_langid"
)
signal = classifier.load_audio("speech.wav")
prediction = classifier.classify_batch(signal)
lang_id = prediction[3][0] # ISO 639-1
confidence = float(prediction[1].exp())
```VoxLingua107 recognizes 107 languages with 93.3% accuracy in 1-second fragments. ### Practical thresholds: For confidence < 0.7, it's better to ask the user to select the language manually or run a more complex model. For systems with a limited set of languages, the accuracy is significantly higher. ### Timeframes: Integration of a ready-made classifier: 1 day. Custom model for a specific set of languages: 1–2 weeks.







