Google Cloud Speech-to-Text API Integration Google Cloud STT is a mature API with support for 125+ languages, an adaptive dictionary, and native integration with other GCP services. WER in English: 4–6%, in Russian clear audio: 8–12%. ### Models and their application | Model | Latency | Best case scenario | |--------|------------|-----------------| | latest_long | high | Long recordings, podcasts | | latest_short | low | Short commands, search | | telephony | medium | Call centers, 8kHz audio | | medical_dictation | medium | Medical dictations | | chirp | low | Universal, all domains | ### Basic integration
from google.cloud import speech
client = speech.SpeechClient()
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code="ru-RU",
model="latest_long",
enable_automatic_punctuation=True,
enable_word_time_offsets=True,
use_enhanced=True,
)
```### Key features - Adaptive dictionary (up to 5,000 phrases) for improved terminology accuracy - Speaker diarization out of the box (up to 6 speakers) - Streaming recognition via gRPC with 200–400 ms latency - Cloud Storage integration for batch processing Cost: $0.004–0.006/minute depending on the model. Free plan — 60 minutes/month. ### Integration time Basic integration: 1–2 days. With adaptive dictionary and diarization — 3–4 days.







