Implementation of specialized vocabulary recognition (medical, legal, technical) Standard STT models are trained on a general corpus. Specific terms—"silicon dioxide," "appellate definition," "STM32F407 microcontroller"—are often recognized incorrectly, making the transcript unusable without post-editing. ### Adaptation Methods 1. Custom Vocabulary / Boosting—the fastest approach, does not require retraining:
# Google STT — адаптивные фразы
from google.cloud import speech
speech_context = speech.SpeechContext(
phrases=[
"мерцательная аритмия",
"фибрилляция желудочков",
"атриовентрикулярная блокада",
"ЭКГ",
"QRS-комплекс"
],
boost=15.0 # от 1 до 20
)
config = speech.RecognitionConfig(
speech_contexts=[speech_context],
language_code="ru-RU"
)
```**2. Post-correction via dictionary** – find phonetically similar words and replace them:```python
from fuzzywuzzy import fuzz
DOMAIN_TERMS = {
"дексаметозон": "дексаметазон",
"миокарда инфаркт": "инфаркт миокарда",
"гипотиреоз": "гипотиреоз",
}
def correct_medical_terms(text: str, threshold: int = 80) -> str:
words = text.split()
for i, word in enumerate(words):
for wrong, correct in DOMAIN_TERMS.items():
if fuzz.ratio(word.lower(), wrong) >= threshold:
words[i] = correct
return " ".join(words)
```**3. Fine-tuning Whisper** — for serious domain adaptation (see the Whisper additional training service). ### Medical domain Whisper shows a WER of ~25% on medical dictations without adaptation. Specialized solutions: - **AWS Medical Transcribe**: WER of ~12%, HIPAA-compliant - **Nuance DAX**: best quality, but only for the US - Fine-tuned Whisper on medical data: WER of 8–15% ### Legal domain Key tasks: accurate reproduction of names, dates, case numbers, legal wording. Recommendation — a dictionary of ~2,000 typical terms + custom vocabulary in a cloud-based STT.
Timeframe: Vocabulary approach – 2–3 days. Fine-tuning – 2–4 weeks, including data collection.







