Implementing Automatic Podcast Transcription. Podcast transcription opens up content for SEO, makes it accessible to the hearing-impaired, and allows for the creation of articles and summaries from audio content. Key requirements: good quality for conversational speech of multiple speakers, support for long recordings (1–3 hours). ### Optimal stack: Whisper large-v3 + Pyannote diarization — the best open-source choice for podcasts. AssemblyAI — the best cloud-based option with ready-made diarization and chapter detection. ### Quick solution via AssemblyAI
import assemblyai as aai
aai.settings.api_key = ASSEMBLYAI_KEY
config = aai.TranscriptionConfig(
language_code="ru",
speaker_labels=True, # диаризация
punctuate=True,
format_text=True,
auto_chapters=True, # автоглавы
entity_detection=True, # упоминания людей/компаний
iab_categories=True, # категоризация контента
)
transcriber = aai.Transcriber(config=config)
transcript = transcriber.transcribe("https://podcast.example.com/episode.mp3")
# Вывод с атрибуцией говорящих
for utterance in transcript.utterances:
print(f"[Спикер {utterance.speaker}] {utterance.text}")
# Автоглавы
for chapter in transcript.chapters:
print(f"## {chapter.headline} ({chapter.start//1000}с)")
print(chapter.summary)
```### Self-hosted for long posts```python
from faster_whisper import WhisperModel
model = WhisperModel("large-v3", device="cuda", compute_type="int8_float16")
def transcribe_podcast(audio_path: str) -> str:
segments, _ = model.transcribe(
audio_path,
language="ru",
vad_filter=True,
vad_parameters={"min_silence_duration_ms": 1000},
word_timestamps=False,
beam_size=5
)
return "\n".join(seg.text for seg in segments)
```For a 1-hour podcast: faster-whisper large-v3 on RTX 4090 = 15–18 minutes of processing. ### Export to various formats - **Markdown** with chapters — for publishing on the website - **SRT** — for adding subtitles to the video version - **PDF** — for downloading Timeframe: basic podcast transcription — 1–2 days. System with publishing and SEO optimization — 1 week.







