AI-based podcast transcription and summarization. Automatic podcast transcription addresses SEO (text content for search engines), accessibility, shownote creation, and content sharing on social media. Whisper large-v3 delivers a WER of 4–8% on clean, studio-quality recordings. ### Basic Pipeline
import whisper
from openai import AsyncOpenAI
async def transcribe_and_summarize_podcast(audio_path: str) -> dict:
# Транскрибация
model = whisper.load_model("large-v3")
result = model.transcribe(
audio_path,
language="ru",
task="transcribe",
verbose=False,
word_timestamps=True
)
transcript = result["text"]
segments = result["segments"] # [{start, end, text}, ...]
# Генерация shownotes через GPT-4o
client = AsyncOpenAI()
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "system",
"content": "Создай shownotes для подкаста: краткое описание эпизода (3-5 предложений), ключевые темы списком, временные метки для основных тем в формате MM:SS."
}, {
"role": "user",
"content": transcript[:6000]
}]
)
# Временные метки ключевых тем
chapters = extract_chapters(segments)
return {
"transcript": transcript,
"shownotes": response.choices[0].message.content,
"chapters": chapters,
"duration_sec": segments[-1]["end"] if segments else 0
}
def extract_chapters(segments: list) -> list[dict]:
"""Выделяем тематические блоки по паузам и семантике"""
chapters = []
# Ищем паузы > 3 секунды как границы глав
for i in range(1, len(segments)):
gap = segments[i]["start"] - segments[i-1]["end"]
if gap > 3.0:
chapters.append({
"timestamp": int(segments[i]["start"]),
"text": segments[i]["text"][:80]
})
return chapters
```### RSS integration for automatic processing```python
import feedparser
import httpx
async def process_podcast_feed(rss_url: str) -> list[dict]:
feed = feedparser.parse(rss_url)
results = []
for entry in feed.entries[:5]: # последние 5 эпизодов
audio_url = next(
(enc.href for enc in entry.enclosures if enc.type.startswith("audio")),
None
)
if not audio_url:
continue
async with httpx.AsyncClient() as client:
audio_data = await client.get(audio_url)
with open(f"/tmp/{entry.id}.mp3", "wb") as f:
f.write(audio_data.content)
result = await transcribe_and_summarize_podcast(f"/tmp/{entry.id}.mp3")
result["title"] = entry.title
result["published"] = entry.published
results.append(result)
return results
```Whisper processes one hour of audio in approximately 3-4 minutes on a GPU (RTX 3090). On a CPU, it takes about 30-40 minutes. For regular podcast processing, cloud inference via the API is sufficient (OpenAI Whisper API: $0.006/min). Estimated processing time: a single podcast processing script takes 1-2 days. A service with RSS monitoring and shownotes publishing takes 1-2 weeks.







