Recommendation System Development

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 30 of 79 servicesAll 1566 services
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822

Recommendation Systems: From Collaborative Filtering to Real-Time Serving

E-commerce with 300k SKU catalog. CTR on recommendations — 1.8%. After replacing "popular last 7 days" rule with collaborative filtering — 3.1%. After adding content features and re-ranking — 4.4%. Real numbers from real project. Difference between "show popular" and "show personalized" — measurable and substantial.

Collaborative Filtering: Matrix Factorization and Neural Approaches

Matrix Factorization. ALS (Alternating Least Squares) — classic implicit feedback algorithm (clicks, views, purchases without explicit ratings). Implicit library implements ALS with GPU acceleration, processes user×item matrices hundreds of millions non-zero values in minutes. Latent factors 64-256, regularization λ=0.01-0.1 — starting parameters.

Cold start problem: new user has no interaction history. For new items: no interactions. Classic CF helpless — need content features or hybrid approach.

Neural Collaborative Filtering. NCF replaces linear dot product in MF with neural network. Gains over well-tuned ALS moderate in practice, but NCF easier to extend with additional features (user age, product category, time of day).

Sequence-aware models. When interaction order matters (user watched A → B → C, what show next) — SASRec or BERT4Rec. Transformer architecture over interaction sequence — state-of-the-art for session recommendations. Trains on sequences, predicts next item.

Content-Based Filtering: When Interaction History Small

Content-based recommends based on item characteristics, not user behavior. Solves cold start for items: new product with description and category can recommend immediately.

Text embeddings. Product descriptions → embeddings via sentence-transformers (multilingual-e5-base or BGE-M3 for multilingual catalog) → similar search via cosine similarity. For 100k products — FAISS IndexFlatIP, query in <5ms.

Structured features. Category, brand, price, specs — via embedding layers in neural network or categorical features in gradient boosting. CatBoost handles categorical well without manual encoding.

Item2Vec. Train Word2Vec on interaction sequences: item_id instead of words, session instead of sentence. Fast, interpretable, works well for "similar products."

Hybrid Approaches: Two-Stage Retrieval + Ranking

Production recommendation systems almost always two-level.

Stage 1: Retrieval (candidate generation). From 300k products quickly select 100-500 candidates. Tools: ALS or Two-Tower model (separate encoders for user and item, dot product for scoring). Vector search via FAISS or Qdrant. Requirement — speed: <20ms.

Stage 2: Ranking. From 100-500 candidates rank final list (top 10-20). Heavy model with rich features: gradient boosting (LightGBM, CatBoost) or neural network with cross-features. Here consider context: device, time, previous session actions. Requirement: <50-100ms.

LightFM — library implementing hybrid factorization models supporting item and user features. Good starting point for mid-scale without heavy infrastructure.

Real-Time Serving: Architecture Under Load

Recommendation system on homepage — latency SLA 50-100ms with thousands requests per second. Serving architecture matters.

Precomputation vs. real-time. For most users recommendations precompute and cache. Batch job hourly/nightly → store top-100 recommendations in Redis by user_id → read from cache on request. Latency <5ms. Downside: doesn't account last-hours events.

Real-time context update. Hybrid: base recommendations from cache + real-time re-ranking with recent session actions. Kafka event stream (clicks, cart adds) → feature computation → context feature update → fast re-ranking.

Feature serving. Redis for user features with TTL (view count last 24h, last clicked item). Read latency <1ms. At 10k req/s load — Redis Cluster with replication.

A/B testing. Recommendation systems can't evaluate only offline metrics (NDCG, MAP). Offline correlates with online CTR but not always. A/B test with 5-10% traffic on new model, monitor CTR, conversion, revenue per session — only reliable way.

Metrics: Offline and Online

Offline metrics:

  • NDCG@k (Normalized Discounted Cumulative Gain) — accounts position in list
  • MAP@k (Mean Average Precision) — binary relevance tasks
  • Recall@k — coverage: what share of relevant items hit top-k
  • Coverage — what share of catalog actually recommended (popularity bias fight)

Online metrics:

  • CTR (Click-Through Rate) — basic engagement
  • Conversion Rate — recommendation to purchase/target action
  • Revenue per user
  • Diversity — recommendation variety (don't show 10 identical products)

Popularity bias — chronic CF problem. Popular items get more interactions → model recommends more → get more. Long tail (80% catalog) poorly recommended. Solution: diversity-aware re-ranking, debiasing in loss, popularity normalization in implicit feedback.

Project Stages

Data audit. Examine interaction history: user×item matrix density (<0.1% typical), activity distribution (20% users give 80% interactions), temporal patterns, cold start statistics.

Baseline. Popular items as recommendations — simple baseline, often hard to beat significantly. Fix offline metrics baseline.

Iterative improvement. ALS → add content features → two-stage system → sequence-aware models. Each step measure offline and verify A/B test.

Serving infrastructure. Batch precomputation, Redis caching, real-time re-ranking, monitoring.

Prototype on existing data with offline validation: 2-3 weeks. Production system with two-stage ranking, A/B testing, monitoring: 2-3 months.