Implementation of Time Series Forecasting
Time series — data with time stamps. This covers sales, energy consumption, stock quotes, IoT sensors, traffic. Correct model and methodology choice is critical: improper handling of temporal dependencies leads to data leakage and false-positive backtest results.
Time Series Classification
Before choosing method — analyze series properties:
Stationarity: ADF-test (Augmented Dickey-Fuller). Non-stationary series require differencing or special methods.
Seasonality: ACF/PACF analysis. Single (weekly) or multiple (weekly + annual) seasonality affects model choice.
Intermittency: ADI (Average Demand Interval) > 1.32 — special methods (Croston, IMAPA).
Nonlinearity: Terasvirta / BDS test. Linear models (ARIMA) inadequate with strong nonlinearity.
Method Hierarchy
| Method | Application | Pros | Limitations |
|---|---|---|---|
| Naive / Seasonal Naive | Baseline, intermittent | Fast, interpretable | Low accuracy |
| ETS (Exponential Smoothing) | Single seasonality | Automatic, works well | Multiple seasonality |
| SARIMA | Statistics, single seasonality | Theory, confidence intervals | Slow, single seasonality |
| Prophet | Business data with holidays | Simplicity, interpretability | Not best for complex patterns |
| LightGBM with lags | Many external factors | High accuracy, features | Requires feature engineering |
| N-BEATS / N-HiTS | Pure TS without external features | SOTA on M4/M5 | Black box |
| TFT | Many series + known covariates | SOTA for ensembles | Complexity, GPU |
| TimesGPT / TimesFM | Foundation model, zero-shot | Fast start | Expensive, less control |
Proper Backtesting
Problem: can't use standard train/test split for time series — breaks temporal ordering.
Walk-Forward Validation:
|---Train---| Test |
|----Train----| Test |
|-----Train-----| Test |
Average metrics across all windows
Test window size = forecast horizon. Step shift = horizon / 2 or = horizon (no overlap).
Data Leakage Sources:
- Using future data in scaling (fit scaler on entire dataset)
- Target encoding with future values
- External features with future information (known future covariates vs. past covariates)
Feature Engineering for ML Approach
Temporal Features:
df['hour'] = df.index.hour
df['day_of_week'] = df.index.dayofweek
df['week_of_year'] = df.index.isocalendar().week
df['month'] = df.index.month
df['is_weekend'] = df['day_of_week'].isin([5, 6]).astype(int)
# Cyclical encoding
df['sin_hour'] = np.sin(2 * np.pi * df['hour'] / 24)
df['cos_hour'] = np.cos(2 * np.pi * df['hour'] / 24)
Lag Features: t-1, t-7, t-14, t-28 for daily data; t-1, t-24, t-168 for hourly.
Rolling Statistics: mean, std, min, max for 7/28/90 days. Differences: (t-1) - (t-7) to capture trend.
Probabilistic Forecasting
Point forecast without uncertainty — insufficient for business decisions. Quantile forecasts:
-
Quantile Regression: LightGBM with
objective='quantile', alpha=0.1/0.5/0.9 - Conformal Prediction: theoretically grounded intervals, don't assume distribution
- Monte Carlo Dropout: in neural networks — ensemble via dropout in inference
- N-HiTS with Quantiles: native support in neuralforecast library
Production Pipeline
# Example with Nixtla / statsforecast
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA, AutoETS, AutoTheta
models = [AutoARIMA(season_length=7), AutoETS(season_length=7), AutoTheta()]
sf = StatsForecast(models=models, freq='D', n_jobs=-1)
sf.fit(train_df)
forecasts = sf.predict(h=28, level=[80, 95])
MLflow Tracking: each experiment — data version, hyperparameters, metrics, model artifact.
Scheduling: Airflow DAG for daily retraining and publishing forecasts to Data Warehouse.
Monitoring: Evidently for tracking data drift of input features and prediction drift of model output.
Timeline: statistical baseline models (AutoARIMA, Prophet) — 2-3 weeks. ML system with walk-forward validation, quantile forecasts and production pipeline — 8-12 weeks.







