AI-powered audience analytics system for publishers
Publishers accumulate rich data on reader behavior, but only use the top-level metrics (views, bounces). Deep audience analytics with ML transforms this data into actionable insights for editorial and sales teams.
Audience segmentation
RFM + Behavioral Clustering:
Behavioral segmentation of readers goes beyond demographics:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
def segment_readers(reader_events_df, n_segments=6):
"""
Сегментация читателей по поведенческим признакам.
reader_events_df: события (статьи, время, scroll_depth, shares)
"""
# Агрегация на уровне читателя
reader_features = reader_events_df.groupby('reader_id').agg({
'article_id': 'count', # Frequency
'session_duration': 'mean', # вовлечённость
'scroll_depth_pct': 'mean', # глубина чтения
'days_active': 'nunique', # активные дни
'category': lambda x: x.mode()[0], # любимая категория
'shares': 'sum', # виральность
'direct_visit': 'mean', # лояльность (не трафик из поиска)
'last_visit': lambda x: (pd.Timestamp.now() - pd.to_datetime(x).max()).days
}).reset_index()
reader_features.columns = ['reader_id', 'articles_read', 'avg_session',
'avg_scroll', 'active_days', 'top_category',
'shares', 'direct_ratio', 'recency_days']
# Нормализация
numeric_cols = ['articles_read', 'avg_session', 'avg_scroll',
'active_days', 'shares', 'direct_ratio', 'recency_days']
scaler = StandardScaler()
X = scaler.fit_transform(reader_features[numeric_cols].fillna(0))
# K-Means кластеризация
kmeans = KMeans(n_clusters=n_segments, random_state=42, n_init=10)
reader_features['segment'] = kmeans.fit_predict(X)
return reader_features
Typical segments: - Loyalists: come directly, read every day, deep scroll - Casual browsers: sometimes come from social networks, read only the headline - Topic specialists: read a lot in one category (audience for niche newsletters) - Social sharers: read little themselves, but actively share - Churning users: were active, stopped
Content analytics
Content Performance Scoring:
Article evaluation is based not only on views, but also on quality metrics:
| Метрика | Вес | Что измеряет |
|---|---|---|
| Read rate (scroll >70%) | 30% | Удержание внимания |
| Time on page / expected | 25% | Реальное чтение vs. bounce |
| Return rate | 20% | Читатель вернулся через статью |
| Social amplification | 15% | Виральность |
| Subscription assists | 10% | Влияние на конверсию |
Topic Resonance Analysis:
Which topics resonate most with different audience segments: - NLP content clustering (BERTopic) → topic clusters - Matrix "Topic × audience segment" → editorial insights - The editors see: "Loyalist readers want more analytics, Casual readers want more listicles"
Subscriber Churn Forecast
Subscriber Health Score:
Dynamic scoring of each subscriber: - Decrease in reading frequency → yellow flag - Unsubscribe from email newsletter → red flag - Pattern: did not open emails for 3 weeks + did not visit the site → P(churn_30d) = 0.7
Win-back campaigns:
If P(churn) > 0.6: - Personalized selection of the best articles for the period of absence - Special offer upon subscription expiration (if LTV > cost of offer) - Re-engagement email series with increasing incentives
Attribution and monetization insights
Content-Subscriber Attribution:
Which articles actually lead to subscriptions? - Multi-touch attribution: the reader read 8 articles before subscribing - Markov Chain attribution: each article gets its fair share of credit for conversion - Editorial: invest in creating content type X because it converts
Development time: 2–4 months for an audience analytics platform with segmentation, churn prediction, and content scoring.







