Implementation of automatic real-time subtitle generation (Live Captions) Live Captions display subtitles as a speech is delivered—for online broadcasts, video conferences, and public events. Key parameter: delay between speech delivery and subtitle appearance (target: <2 seconds). ### Live Captions System Architecture
Microphone/Stream → WebSocket → STT → Formatter → WebSocket → Browser/Display
↓ (partial+final)
16kHz PCM 100-500ms latency
```### Server side with WebSocket```python
from fastapi import FastAPI, WebSocket
from faster_whisper import WhisperModel
import asyncio
import numpy as np
app = FastAPI()
model = WhisperModel("medium", device="cuda", compute_type="float16")
@app.websocket("/live-captions")
async def live_captions(websocket: WebSocket):
await websocket.accept()
clients: set[WebSocket] = set()
clients.add(websocket)
audio_buffer = bytearray()
last_partial = ""
async for chunk in websocket.iter_bytes():
audio_buffer.extend(chunk)
# Обрабатываем каждые 2 секунды
if len(audio_buffer) >= 32000 * 2: # 2 sec @ 16kHz
audio_array = np.frombuffer(audio_buffer, dtype=np.int16).astype(np.float32) / 32768.0
segments, _ = model.transcribe(audio_array, language="ru")
partial_text = " ".join(seg.text.strip() for seg in segments)
if partial_text != last_partial:
last_partial = partial_text
await websocket.send_json({
"type": "partial",
"text": partial_text,
"timestamp": asyncio.get_event_loop().time()
})
audio_buffer = bytearray()
```### Client-side viewer (React)```tsx
const LiveCaptions: React.FC = () => {
const [captions, setCaptions] = useState<string[]>([]);
useEffect(() => {
const ws = new WebSocket('wss://api.example.com/live-captions');
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'final') {
setCaptions(prev => [...prev.slice(-4), data.text]);
}
};
return () => ws.close();
}, []);
return (
<div className="captions-overlay">
{captions.map((caption, i) => (
<p key={i} className={i === captions.length - 1 ? 'current' : 'previous'}>
{caption}
</p>
))}
</div>
);
};
```### Integration with OBS/broadcasts. The OBS WebSocket plugin allows you to send subtitles directly to the stream. Alternatively, you can use an NDI overlay or a web player with WebSocket subtitles over HLS. Timeframe: Basic live captions server – 3–5 days. Integrated broadcast system – 2 weeks.







