Real-Time Live Captions Auto-Generation Implementation

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
Real-Time Live Captions Auto-Generation Implementation
Medium
from 1 business day to 3 business days
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822

Implementation of automatic real-time subtitle generation (Live Captions) Live Captions display subtitles as a speech is delivered—for online broadcasts, video conferences, and public events. Key parameter: delay between speech delivery and subtitle appearance (target: <2 seconds). ### Live Captions System Architecture

Microphone/Stream → WebSocket → STT → Formatter → WebSocket → Browser/Display
      ↓                              (partial+final)
   16kHz PCM                     100-500ms latency
```### Server side with WebSocket```python
from fastapi import FastAPI, WebSocket
from faster_whisper import WhisperModel
import asyncio
import numpy as np

app = FastAPI()
model = WhisperModel("medium", device="cuda", compute_type="float16")

@app.websocket("/live-captions")
async def live_captions(websocket: WebSocket):
    await websocket.accept()
    clients: set[WebSocket] = set()
    clients.add(websocket)

    audio_buffer = bytearray()
    last_partial = ""

    async for chunk in websocket.iter_bytes():
        audio_buffer.extend(chunk)

        # Обрабатываем каждые 2 секунды
        if len(audio_buffer) >= 32000 * 2:  # 2 sec @ 16kHz
            audio_array = np.frombuffer(audio_buffer, dtype=np.int16).astype(np.float32) / 32768.0
            segments, _ = model.transcribe(audio_array, language="ru")

            partial_text = " ".join(seg.text.strip() for seg in segments)
            if partial_text != last_partial:
                last_partial = partial_text
                await websocket.send_json({
                    "type": "partial",
                    "text": partial_text,
                    "timestamp": asyncio.get_event_loop().time()
                })

            audio_buffer = bytearray()
```### Client-side viewer (React)```tsx
const LiveCaptions: React.FC = () => {
  const [captions, setCaptions] = useState<string[]>([]);

  useEffect(() => {
    const ws = new WebSocket('wss://api.example.com/live-captions');

    ws.onmessage = (event) => {
      const data = JSON.parse(event.data);
      if (data.type === 'final') {
        setCaptions(prev => [...prev.slice(-4), data.text]);
      }
    };

    return () => ws.close();
  }, []);

  return (
    <div className="captions-overlay">
      {captions.map((caption, i) => (
        <p key={i} className={i === captions.length - 1 ? 'current' : 'previous'}>
          {caption}
        </p>
      ))}
    </div>
  );
};
```### Integration with OBS/broadcasts. The OBS WebSocket plugin allows you to send subtitles directly to the stream. Alternatively, you can use an NDI overlay or a web player with WebSocket subtitles over HLS. Timeframe: Basic live captions server – 3–5 days. Integrated broadcast system – 2 weeks.