Implementation of Customer Churn Prediction
Churn prediction is one of the most measurable ML tasks in business: each percentage point reduction in churn directly converts to LTV and ARR. For SaaS with $1M MRR, reducing churn from 5% to 4% per month generates ~$120K additional ARR per year.
Defining Churn
Before building a model — clear target definition:
- Contractual churn: customer did not renew subscription (B2B SaaS, telecom)
- Non-contractual churn: customer stopped purchasing (e-commerce, mobile games)
- Soft churn: reduced activity/consumption (risk of churn in next 60 days)
For non-contractual, there's no clear label — need to define inactivity threshold: if customer didn't purchase for X days — considered churned. Choice of X impacts class balance and model accuracy.
Feature Engineering
RFM Metrics (most important predictors):
- Recency: days since last action/transaction
- Frequency: number of sessions/purchases in 30/90/180 days
- Monetary: spending sum for period
Behavioral Features:
- Trend features: activity growth/decline in last 30 days vs. previous 30
- Feature adoption rate: % of key product features customer uses
- Support tickets: number of requests, type, NPS after resolution
Contractual/Demographic:
- Time since onboarding
- Plan type
- Segment (SMB / Enterprise)
- Acquisition channel
Algorithm Selection
| Algorithm | When to Use | Accuracy | Interpretability |
|---|---|---|---|
| Logistic Regression | Baseline, need interpretability | Medium | High |
| LightGBM / XGBoost | Tabular data, no time series | High | Medium (SHAP) |
| CatBoost | Many categorical features | High | Medium |
| LSTM / Transformer | Event sequences matter | Very High | Low |
Recommendation: start with LightGBM as baseline, add Sequence Model if behavioral patterns matter (when customer reduces activity is more important than final aggregates).
Handling Imbalanced Classes
Typical ratio: 2-10% of customers churn per period. Without correction, model predicts "stays" for all with 90%+ accuracy, but 0% Recall for churners.
Methods:
- Class weights:
class_weight='balanced'in sklearn — simplest fix - SMOTE (Synthetic Minority Over-sampling): generate synthetic minority class examples
- Focal Loss: in neural networks — downweights easy examples
- Threshold optimization: choose classification threshold via Precision-Recall curve, not 0.5
Evaluation Metrics:
- Weighted F1-score — primary
- AUC-ROC — ranking ability
- Precision@K — accuracy among top-K customers by risk (most important for marketing)
Deployment and Usage
Batch Scoring:
- Weekly model run across entire customer base
- Result: table with churn probability for each customer
- Segmentation: high risk (> 0.7), medium risk (0.4-0.7), low risk (< 0.4)
Real-time Scoring:
- For key events: app login, support contact, consumption decrease
- API endpoint: POST /score, < 100 ms response
- CRM score update in real-time
Retention by Segment:
- High risk: personal call from Customer Success or discount
- Medium risk: automated email campaign with value reminder
- Low risk: no action (save resources)
Assessing Business Impact
Uplift modeling — correct way to measure system's real value. Standard A/B test: 50% of high-risk customers get retention (treatment), 50% — no (control). Measure churn rate difference.
Uplift > ROI of retention campaign = positive business effect.
Timeline: first working model with basic RFM features — 2-3 weeks. Full system with feature store, drift monitoring and CRM integration — 8-10 weeks.







