Deepgram Integration for Speech Recognition Deepgram is one of the fastest cloud STT providers: streaming recognition latency is 100-200 ms. The Nova-2 model shows a WER of 5-8% in English, for Russian - the beta model, the WER is around 12-18%. ### Deepgram Models | Model | Languages | Speed | Scenario | |--------|---------|-----------| | Nova-2 | 30+ | 30x RT | General Purpose | | Enhanced | 36+ | 50x RT | Call Centers | | Base | 36+ | 100x RT | Speed-demanding | | Whisper | 99+ | 10x RT | Multilingual tasks | ### Integration via Web
Socket (streaming)
import asyncio
import websockets
import json
async def transcribe_stream():
url = "wss://api.deepgram.com/v1/listen"
headers = {"Authorization": f"Token {DEEPGRAM_API_KEY}"}
params = "?model=nova-2&language=ru&punctuate=true&diarize=true"
async with websockets.connect(url + params, extra_headers=headers) as ws:
async def send_audio():
with open("audio.wav", "rb") as f:
while chunk := f.read(4096):
await ws.send(chunk)
await ws.send(json.dumps({"type": "CloseStream"}))
async def receive_results():
async for message in ws:
result = json.loads(message)
if result.get("is_final"):
transcript = result["channel"]["alternatives"][0]["transcript"]
print(transcript)
await asyncio.gather(send_audio(), receive_results())
```### Nova-2 cost: $0.0043/minute. Enhanced: $0.0145/minute. Free limit: $200 credits upon registration. Integration: 1 day (REST), 2 days (WebSocket streaming).







