Vosk Integration (Offline STT) for Speech Recognition
Vosk — open-source offline speech recognition toolkit based on Kaldi. Works without internet, supports 20+ languages including Ukrainian, occupies 50–500 MB depending on model. Ideal for private and offline-first applications.
Vosk Capabilities
- Streaming recognition (real-time, doesn't wait for end of phrase)
- Speaker identification (who's speaking)
- Partial results for displaying text during speech
- Custom dictionary for specialized terminology
- Bindings: Python, Java (Android), JavaScript (Node.js/Browser), C#, Go
Models for Ukrainian Language
vosk-model-uk-v3 — best quality for Ukrainian. WER ~10% on clean speech, ~18% in noise. vosk-model-small-uk-v3 (45 MB) — for embedded devices, WER ~16%.
Integration
from vosk import Model, KaldiRecognizer
import pyaudio
model = Model("vosk-model-uk-v3")
recognizer = KaldiRecognizer(model, 16000)
# streaming recognition via PyAudio or WebSocket
When Vosk vs Whisper
Vosk better: real-time streaming, embedded devices (Pi, microcontroller), strict privacy requirements, low latency needs. Whisper better: highest recognition quality, handling poor acoustics, wide language coverage.







