Deploying OpenAI Whisper on a Dedicated Server (Self-Hosted) Self-hosted Whisper gives you complete control over your data, predictable pricing for large volumes, and the ability to fine-tune it for a specific accent or domain. When transcribing 100+ hours of audio per month, a dedicated server pays for itself faster than a cloud API. ### Production Deployment Architecture
Audio Input → Nginx → FastAPI Workers → Whisper Workers (GPU) → PostgreSQL
↓ ↓
Redis Queue S3 Storage
```Main components: - **FastAPI** — REST API for receiving tasks - **Celery** — asynchronous processing queue - **Redis** — task broker and cache - **faster-whisper** — inference engine (CTranslate2) - **PostgreSQL** — storage of transcriptions and metadata ### Celery worker configuration```python
from celery import Celery
from faster_whisper import WhisperModel
app = Celery('whisper_tasks', broker='redis://localhost:6379/0')
model = WhisperModel("large-v3", device="cuda", compute_type="int8_float16")
@app.task(bind=True, max_retries=3)
def transcribe_audio(self, file_path: str, language: str = None):
try:
segments, info = model.transcribe(
file_path,
language=language,
vad_filter=True,
word_timestamps=True
)
return {
"language": info.language,
"duration": info.duration,
"segments": [
{"start": s.start, "end": s.end, "text": s.text}
for s in segments
]
}
except Exception as exc:
raise self.retry(exc=exc, countdown=60)
```### Hardware Requirements | Load | GPU | RAM | Disk | |----------|-----|------| | up to 10 hours/day | RTX 3080 10GB | 16 GB | 100 GB SSD | | up to 100 hours/day | RTX 4090 | 32 GB | 500 GB SSD | | more than 100 hours/day | 2x A10G | 64 GB | 2 TB NVMe | ### Monitoring and Reliability - Celery Flower for task queue monitoring - Prometheus + Grafana for GPU utilization and queue depth metrics - Automatic worker restart via systemd
- Healthcheck endpoint with GPU availability check ### AWS API Whisper cost estimate: $0.006/minute. Self-hosted on A10G (rental ~$1.50/hour): at 50% utilization — ~$0.001/minute. Payback at volume from 3,000 minutes/month. ### Implementation timeframe - Basic deployment: 2–3 days - With job queue and API: 5–7 days - Full production system with monitoring: 2 weeks







