Azure Speech Services integration for speech recognition. Azure Cognitive Services Speech is a Microsoft enterprise solution with data centers in Russia (until 2022), Germany, and other regions. Support for 100+ languages, HIPAA compliance, and a 99.9% SLA. ### Key features: Custom Speech: custom training for a corporate vocabulary without ML expertise. - Diarrhization (up to 20 speakers in Azure Speech). - Streaming recognition with 150–300 ms latency. - Batch transcription via the REST API for large volumes. ### SDK integration.
import azure.cognitiveservices.speech as speechsdk
speech_config = speechsdk.SpeechConfig(
subscription=os.environ["AZURE_SPEECH_KEY"],
region="westeurope"
)
speech_config.speech_recognition_language = "ru-RU"
speech_config.enable_dictation()
audio_config = speechsdk.AudioConfig(filename="audio.wav")
recognizer = speechsdk.SpeechRecognizer(
speech_config=speech_config,
audio_config=audio_config
)
result = recognizer.recognize_once_async().get()
```### Custom Speech Domain data upload via Azure Portal: adding text data (for the language model) and audio and transcriptions (for the acoustic model). With 10 hours of data, WER improves by 20–35% on the target domain. Cost: $1/hour of audio for standard transcription, Custom Speech endpoint — $1.42/hour of endpoint operation. Integration time: 1–2 days (SDK), 3–5 days with Custom Speech.







