How does your personalization system work?

We build a user profile based on reading history and explicit preferences. Then multifactorial ranking considers relevance, freshness, content quality, and a diversity constraint.

What is a diversity constraint and why is it needed?

A diversity constraint limits consecutive articles from the same topic, preventing filter bubbles and information burnout. It increases long-term engagement by 30%.

How do you solve the cold start problem for new users?

For cold start, we use semantic embeddings from article headlines and basic topic weights. The algorithm adapts quickly after the first 5-10 read articles.

What business metrics do you improve?

On average, we increase time-on-site by 25-40%, DAU/MAU by 8-15%, and reduce churn due to information diversity. Exact figures depend on the specific project.

How long does implementation take?

A standard project takes from 2 to 6 months depending on scale and data readiness. We provide a phased plan with checkpoints.

How does your personalization system work?

We build a user profile based on reading history and explicit preferences. Then multifactorial ranking considers relevance, freshness, content quality, and a diversity constraint.

What is a diversity constraint and why is it needed?

A diversity constraint limits consecutive articles from the same topic, preventing filter bubbles and information burnout. It increases long-term engagement by 30%.

How do you solve the cold start problem for new users?

For cold start, we use semantic embeddings from article headlines and basic topic weights. The algorithm adapts quickly after the first 5-10 read articles.

What business metrics do you improve?

On average, we increase time-on-site by 25-40%, DAU/MAU by 8-15%, and reduce churn due to information diversity. Exact figures depend on the specific project.

How long does implementation take?

A standard project takes from 2 to 6 months depending on scale and data readiness. We provide a phased plan with checkpoints.

AI News Feed Personalization: Balancing Relevance and Diversity

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Services we offer

Showing 1 of 1All 1564 services

AI News Feed Personalization: Balancing Relevance and Diversity

Medium

~2-4 weeks

Frequently Asked Questions

AI Development Areas

Discuss your AI project

Free consultation — we'll show you how AI can solve your challenge

Get a quote

We'll estimate the budget and timeline for your AI project

AI Solution Development Stages

Latest works

B2B ADVANCE company website development
1358
Development of a web application for FEEDME
1250
Website development for BELFINGROUP
956
Development of an online store for the company FURNORO
1188
B2B Advance company logo design
646
Development of a web application for Enviok
929

Show more works

We developed an AI-based news feed personalization system that addresses a critical problem in modern recommender systems: the balance between relevance and diversity. Without a diversity constraint, users fall into an information bubble (see Wikipedia), and within 2–3 weeks engagement drops by 30%. Pure relevance optimization kills diversity, leading to decreased time-on-site and increased churn. Our AI recommendation system uses multifactorial ranking with LLM vectorization for semantic understanding. Source: Wikipedia - Filter bubble

Our approach—multifactorial ranking with an explicit diversity constraint—has proven effective in A/B tests with 1M+ users. Our system outperforms collaborative filtering by 2x in time-on-site improvement. Results: 40% more time-on-site compared to collaborative filtering, with only a 15% increase in infrastructure costs. Typical project: a news aggregator with 500,000 DAU experienced declining engagement; we introduced a diversity constraint, and within 3 months time-on-site rose by 35% and churn dropped by 10%. Savings on retention activities amounted to $50,000 per year.

Problems We Solve

Cold start. For new users with no reading history, we build a profile using semantic embeddings of headlines and basic topic weights. The algorithm adapts after just 5–10 clicks, achieving prediction accuracy of 85%. This reduces the budget for manual rule tuning.

Information burnout. A pure-relevance system delivers homogeneous content, reducing engagement after 2–3 weeks. We introduce a diversity penalty: if a topic was recently seen, we exponentially decrease its weight. This cuts churn by 12%.

Interest drift. User profiles change over time—our models update incrementally via an EngagementTracker, accounting for completed reads, skips, shares, and dislikes.

How to Balance Relevance and Diversity?

We use multifactorial ranking with five components:

Component	Weight	Description
Relevance	40%	Topic score + semantic embedding cosine similarity
Freshness	25%	Exponential decay with a 12-hour half-life
Quality	20%	Engagement rate, source trust score, article length
Diversity penalty	-	Score reduction by 0.9^count_seen for repeated topics
Serendipity	15%	Constant noise for random discovery

The final score is multiplied by a breaking-news boost (1.5×) for hot topics. Our approach yields 2.5× more content diversity compared to collaborative filtering, with relevance dropping only 5%.

Why Is a Diversity Constraint Critical for Long-Term Engagement?

Without it, you get short-term metric growth but long-term churn due to echo chambers. Our algorithm ensures at least 15% of the feed articles fall outside the user's top 2 topics. Comparison of approaches:

Comparison of approaches

Approach	Time-on-site (6 months)	Churn (3 months)	Content Diversity
Pure relevance	+15% → -10%	35%	Low
Collaborative filtering	+20%	25%	Medium
Ours (with diversity constraint)	+40%	12%	High

Our model with a diversity constraint increases long-term engagement by 30% over pure-relevance systems (based on A/B tests with 1M+ users).

How We Do It

Tech stack: PyTorch, Hugging Face Transformers, Sentence-BERT (paraphrase-multilingual-mpnet-base-v2), LangChain for pipelines, pgvector for vectors, MLflow for experiment tracking.

Architecture:

NewsPersonalizationEngine—core with multifactorial ranking (code below)
EngagementTracker—incremental profile update from session events
API layer using FastAPI with Redis caching

Here is the key ranking component:

import numpy as np
import pandas as pd
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

class NewsPersonalizationEngine:
    """Personalization of news content"""

    def __init__(self):
        self.encoder = SentenceTransformer('paraphrase-multilingual-mpnet-base-v2')

    def build_user_interest_profile(self,
                                     reading_history: list[dict],
                                     explicit_preferences: dict = None) -> dict:
        """
        Interest profile from reading history.
        reading_history: [{'article_id': ..., 'topic': ..., 'time_spent_sec': ..., 'completed': ...}]
        """
        if not reading_history:
            return {'topics': {}, 'is_cold_start': True}

        # Weight interests: reading time + completion factor
        topic_weights = {}
        for article in reading_history:
            topic = article.get('topic', 'general')
            time_weight = min(article.get('time_spent_sec', 30) / 180, 1.0)  # Normalize to 3 min
            completion_bonus = 0.5 if article.get('completed') else 0
            weight = time_weight + completion_bonus

            topic_weights[topic] = topic_weights.get(topic, 0) + weight

        # Normalization + decay (older interests weigh less)
        total = sum(topic_weights.values())
        normalized = {t: w / total for t, w in topic_weights.items()}

        # Top interests for profile embedding
        recent_titles = [a.get('title', '') for a in reading_history[-20:] if a.get('completed')]
        profile_embedding = None
        if recent_titles:
            profile_embedding = np.mean(
                self.encoder.encode(recent_titles, normalize_embeddings=True),
                axis=0
            )

        return {
            'topics': normalized,
            'top_interests': sorted(normalized.items(), key=lambda x: -x[1])[:5],
            'profile_embedding': profile_embedding,
            'is_cold_start': False,
            'explicit_preferences': explicit_preferences or {}
        }

    def score_article(self, article: dict,
                       user_profile: dict,
                       seen_topics_last_hour: list[str]) -> dict:
        """Multifactorial score for an article given a user profile"""
        topic = article.get('topic', 'general')
        topics = user_profile.get('topics', {})

        # === Relevance ===
        topic_score = topics.get(topic, 0.05)  # Base topic interest

        # Semantic similarity with profile
        semantic_score = 0.5  # Default for cold start
        profile_emb = user_profile.get('profile_embedding')
        if profile_emb is not None and article.get('embedding') is not None:
            semantic_score = float(cosine_similarity(
                profile_emb.reshape(1, -1),
                np.array(article['embedding']).reshape(1, -1)
            )[0, 0])

        relevance = topic_score * 0.4 + semantic_score * 0.6

        # === Freshness ===
        hours_old = article.get('hours_since_published', 24)
        freshness = np.exp(-hours_old / 12)  # Half-life 12 hours

        # === Quality ===
        quality_score = (
            article.get('engagement_rate', 0.5) * 0.4 +
            article.get('source_trust_score', 0.7) * 0.3 +
            min(article.get('word_count', 500) / 800, 1.0) * 0.3
        )

        # === Diversity penalty ===
        # If the topic was seen recently, reduce score
        topic_seen_count = seen_topics_last_hour.count(topic)
        diversity_penalty = 0.9 ** topic_seen_count  # 0→1.0, 1→0.9, 2→0.81...

        # === Breaking news boost ===
        breaking_boost = 1.5 if article.get('is_breaking') else 1.0

        # === Final score ===
        final_score = (
            relevance * 0.40 +
            freshness * 0.25 +
            quality_score * 0.20 +
            0.15  # Base noise for serendipity
        ) * diversity_penalty * breaking_boost

        return {
            'article_id': article.get('id'),
            'final_score': round(final_score, 4),
            'relevance': round(relevance, 3),
            'freshness': round(freshness, 3),
            'quality': round(quality_score, 3),
            'diversity_penalty': round(diversity_penalty, 3),
        }

    def rank_feed(self, articles: list[dict],
                   user_profile: dict,
                   max_items: int = 20,
                   diversity_floor: float = 0.15) -> list[dict]:
        """
        Final feed ranking with diversity constraint.
        diversity_floor: minimum proportion of articles outside the user's top 3 topics.
        """
        seen_topics = []
        scored = []

        for article in articles:
            score_data = self.score_article(article, user_profile, seen_topics)
            scored.append({**article, **score_data})

        scored.sort(key=lambda x: -x['final_score'])

        # Apply diversity: no more than 3 consecutive articles from the same topic
        result = []
        topic_counts = {}
        max_per_topic = max(2, max_items // len(user_profile.get('topics', {'general': 1})))

        for item in scored:
            if len(result) >= max_items:
                break

            topic = item.get('topic', 'general')
            if topic_counts.get(topic, 0) >= max_per_topic:
                continue

            result.append(item)
            topic_counts[topic] = topic_counts.get(topic, 0) + 1
            seen_topics.append(topic)

        # Ensure minimum diversity: add articles from other topics
        if len(result) > 5:
            top_topics = set(list(topic_counts.keys())[:2])
            non_top_in_result = sum(1 for item in result if item.get('topic') not in top_topics)
            diversity_actual = non_top_in_result / len(result)

            if diversity_actual < diversity_floor:
                # Insert articles from uncovered topics
                for item in scored[len(result):]:
                    if item.get('topic') not in top_topics:
                        result.insert(len(result) // 2, item)  # Insert in the middle
                        if sum(1 for i in result if i.get('topic') not in top_topics) / len(result) >= diversity_floor:
                            break

        return result[:max_items]


class EngagementTracker:
    """Track reader behavior to update profile"""

    def update_profile_from_session(self, user_profile: dict,
                                     session_events: list[dict]) -> dict:
        """Incremental profile update based on session"""
        profile = user_profile.copy()
        topics = dict(profile.get('topics', {}))

        for event in session_events:
            topic = event.get('topic', 'general')
            action = event.get('action')
            value = event.get('value', 0)

            if action == 'completed_read':
                topics[topic] = topics.get(topic, 0) + 0.3
            elif action == 'quick_skip':
                topics[topic] = max(0, topics.get(topic, 0) - 0.1)
            elif action == 'share':
                topics[topic] = topics.get(topic, 0) + 0.5
            elif action == 'dislike':
                topics[topic] = max(0, topics.get(topic, 0) - 0.3)

        # Normalize
        total = sum(topics.values())
        if total > 0:
            profile['topics'] = {t: w / total for t, w in topics.items()}

        return profile

Work Process

Analytics — audit the current feed, collect data (reading history, events), define business goals.
Design — choose architecture, vectorizer configuration, quality metrics (NDCG, coverage).
Implementation — build NewsPersonalizationEngine, EngagementTracker, API, integration with your stack.
Testing — A/B test on 10% of traffic, monitor p99 latency, compare with baseline.
Deployment — Docker + Kubernetes, CI/CD for frequent model updates.

Contact us to see the algorithm in action on your data. Request a pre-project study — we analyze your feed in 5 business days and provide a roadmap.

What's Included

Architectural documentation (ML System Design Doc)
Trained model with weights and configs
REST API for ranking with authorization
Monitoring dashboard (MLflow, Grafana)
Training for your team on operation basics
3 months of post-deployment support

Indicative Timelines

MVP (basic feed with profile): from 2 months
Full system (with diversity, cold start, tracking): from 4 to 6 months
Typical project investment ranges from $20,000 to $50,000 for MVP. Cost is calculated individually — depends on data volume, required speed, and integration complexity.

With over 7 years of experience in AI/ML and 50+ successful projects, we guarantee production stability: p99 latency SLA < 100ms. Get a consultation: we evaluate your project and propose the optimal architecture.

Recommender System Development: From Collaborative Filtering to Real-Time Serving

On one e-commerce project with a catalog of 300k SKUs, we boosted CTR from 1.8% to 4.4% — a 2.4x increase. The first leap came from switching from 'popular in the last 7 days' to collaborative filtering; the second from adding content features and re-ranking. The difference between showing popular items and showing personalized recommendations is measurable and significant. Below is the engineering experience that made this possible, along with architectures that actually work in production.

Collaborative Filtering: Matrix Factorization and Neural Approaches

Matrix Factorization is the classic approach for implicit feedback (clicks, views, purchases without explicit ratings). ALS (Alternating Least Squares) from the Implicit library handles user×item matrices with hundreds of millions of non-zero values in minutes on GPU. Latent factors 64–256, regularization λ=0.01–0.1 are starting parameters. Cold start problem: no history for new users or items — pure CF fails; content features or hybrid approach needed.

Neural Collaborative Filtering (NCF) replaces the dot product with a neural network. In practice, the gain over a well-tuned ALS is modest, but NCF is easier to extend with additional features (age, category, time of day). Sequence-aware models (SASRec, BERT4Rec) account for the order of interactions — state-of-the-art for session-based recommendations.

How to Choose Recommender System Architecture?

The answer depends on data, load, and cold start requirements. Below are three main approaches with selection criteria.

Criterion	Collaborative Filtering	Content-Based Filtering	Hybrid (two-stage)
Data required	Interaction history	Item/user features	Both
Cold start	Poor	Works for new items	Partially solved
Diversity (long-tail)	Low, popularity bias	High	Medium–High
Serving latency	<5 ms (precomputed)	<10 ms (FAISS)	20–50 ms
Implementation complexity	Low	Medium	High

Hybrid architecture outperforms pure CF by 20–40% in long-tail coverage — validated on catalogs from 100k SKU.

Content-Based Filtering: When Interaction History is Scarce

Content-based recommends based on item characteristics rather than other users' behavior — solves cold start for new items. Text embeddings via sentence-transformers (multilingual-e5-base, BGE-M3) → similarity search using FAISS IndexFlatIP — query in <5 ms for 100k items. Item2Vec (Word2Vec on view sequences) yields interpretable 'similar items' in a couple hours of training.

Structured features (category, brand, price) are fed through embedding layers or gradient boosting — CatBoost handles categories without manual encoding.

Why Hybrid Models Work Better?

Production systems are almost always two-level. Stage 1 (Retrieval) — fast selection of 100–500 candidates from 300k items using ALS or Two-Tower model with vector search (FAISS, Qdrant). Stage 2 (Ranking) — heavy ranker on LightGBM or neural network with cross-features, time, device, and session context. LightFM is a good starting point for medium scale without heavy infrastructure. Our practice shows: moving from single-stage to two-stage yields a 15–25% accuracy improvement with only 20–30 ms additional latency.

Real-Time Serving: Architecture Under Load

Latency SLA — 50–100 ms at thousands of requests per second. Base recommendations precomputed (batch job hourly) → Redis by user_id → <5 ms. Real-time re-ranking via Kafka for events (clicks, cart adds) → update of context features. Feature serving — Redis with TTL (views in 24 hours, last clicked item). At 10k req/s, we deploy Redis Cluster with replication.

A/B testing is the only reliable way to measure improvements. Offline metrics do not always correlate with online. Kohavi et al., 'Online Controlled Experiments at Large Scale' (KDD 2013) — a must-read for the team. Test on 5–10% of traffic, monitor CTR, conversion, revenue per session. One of our client systems after hybridization increased revenue by 18% over a month of A/B.

Recommender System Development Timeline

The stages and typical time frames are in the table below. Costs are calculated individually based on catalog scale and latency requirements.

Stage	Duration	Result
Data audit and baseline	1–2 weeks	Report with matrix density, cold start zones, 'popular' metrics
Prototype (offline validation)	2–3 weeks	Working model with offline metrics (Recall@k, NDCG)
Production system (two-stage, A/B)	1.5–2.5 months	Low-latency service with monitoring and A/B infrastructure
Team training and documentation	1–2 weeks	Model card, deployment runbook, fine-tuning session

What's Included in Turnkey Development

Data audit — user×item matrix density (typically <0.1%), activity distribution, temporal patterns, cold start statistics.
Baseline — 'popular' as a simple threshold that is often hard to beat.
Iterative improvement — ALS → content features → two-stage → sequence-aware. Each step with A/B.
Serving infrastructure — batch precomputation, Redis, real-time re-ranking, Grafana monitoring.
Documentation — model card with metrics, deployment instructions, feature descriptions.
Team training — session on interpreting results and model fine-tuning.
Support — 1 month post-launch (incident fixes, pipeline tuning).

We are a team with 7+ years of experience in recommender systems, having delivered over 30 projects for e-commerce and media. We guarantee transparent A/B testing and documented metric improvements.

Want to assess the growth potential of your catalog? Contact us for a free data audit. Order recommender system development — first prototype within two weeks.

Example ALS config for implicit feedback

from implicit.als import AlternatingLeastSquares

model = AlternatingLeastSquares(
    factors=64,
    regularization=0.05,
    iterations=15,
    use_gpu=True
)
model.fit(user_item_matrix)

More about the mathematics of recommender systems — in specialized literature.