AI-Powered Cryptocurrency Price Prediction in Mobile Applications
Fair disclaimer: cryptocurrency price prediction is a high-noise problem. Academic research shows accuracy of 54–60% on directional movement for LSTM models on BTC — only marginally better than random guessing. The value of such a system lies not in prediction accuracy, but in processing more signals faster than a human analyst can manually.
Data Foundation
OHLCV via CCXT
ccxt is a Python library providing a unified API across 100+ cryptocurrency exchanges. The standard approach for historical data retrieval:
import ccxt
import pandas as pd
exchange = ccxt.binance()
ohlcv = exchange.fetch_ohlcv(
symbol="BTC/USDT",
timeframe="1h",
since=exchange.parse8601("2023-01-01T00:00:00Z"),
limit=1000
)
df = pd.DataFrame(ohlcv, columns=["timestamp", "open", "high", "low", "close", "volume"])
df["timestamp"] = pd.to_datetime(df["timestamp"], unit="ms")
Binance returns up to 1000 candles per request. For complete historical data, implement pagination using the since parameter.
On-Chain Data
For BTC and ETH, on-chain metrics add signals beyond OHLCV data:
- Glassnode API: SOPR (Spent Output Profit Ratio), NVT, NUPL, Exchange Net Flow. Paid service with free tier offering daily data
- Etherscan API: transaction volume, gas fees, active addresses
- CoinGecko / CoinMarketCap: market cap, dominance, total market volume
import requests
class GlassnodeCollector:
BASE_URL = "https://api.glassnode.com/v1/metrics"
def get_sopr(self, api_key: str, since: int, until: int) -> pd.DataFrame:
response = requests.get(
f"{self.BASE_URL}/indicators/sopr",
params={
"a": "BTC",
"i": "24h",
"s": since,
"u": until,
"api_key": api_key
}
)
data = response.json()
return pd.DataFrame(data).rename(columns={"t": "timestamp", "v": "sopr"})
SOPR > 1 in an uptrend = holders selling at profit. SOPR < 1 during decline = capitulation. This provides additional context for ML models.
Technical Indicators as Features
Transform raw OHLCV into technical indicators using pandas-ta or ta-lib:
import pandas_ta as ta
df.ta.rsi(length=14, append=True) # RSI_14
df.ta.macd(append=True) # MACD_12_26_9, MACDh, MACDs
df.ta.bbands(length=20, append=True) # BBL, BBM, BBU, BBB, BBP
df.ta.atr(length=14, append=True) # ATRr_14
df.ta.obv(append=True) # OBV
df.ta.vwap(append=True) # VWAP_D
Critical: normalize all indicators. RSI is already [0, 100]. MACD should be normalized using Z-score or min-max scaling over a rolling window. Feed returns (percentage changes) and normalized features to the model, not raw prices.
Model Approaches
LSTM for Time Series
The standard choice. Takes a sequence of N candles and predicts the next:
import tensorflow as tf
def build_lstm_model(sequence_length: int, n_features: int) -> tf.keras.Model:
inputs = tf.keras.Input(shape=(sequence_length, n_features))
x = tf.keras.layers.LSTM(128, return_sequences=True, dropout=0.2)(inputs)
x = tf.keras.layers.LSTM(64, dropout=0.2)(x)
x = tf.keras.layers.Dense(32, activation="relu")(x)
# Predict direction (classification) or return (regression)
outputs = tf.keras.layers.Dense(3, activation="softmax")(x) # up/down/sideways
return tf.keras.Model(inputs, outputs)
Directional classification (up/down/sideways) is more reliable than regression on exact price. Evaluate using accuracy and F1 on out-of-sample data.
Temporal Fusion Transformer (TFT)
Google's TFT is state-of-the-art for financial time series. Supports multiple temporal horizons, static and dynamic covariates, and interpretability through attention. Implemented in pytorch-forecasting. More computationally intensive than LSTM, but achieves better accuracy with properly prepared data.
XGBoost as Baseline
Don't underestimate gradient boosting with well-engineered features. XGBoost without temporal context often competes with LSTM. Fast to train, easily convertible to TFLite. Excellent baseline for comparison.
| Model | Advantages | Disadvantages |
|---|---|---|
| LSTM | Captures temporal context | Slow training, data-intensive |
| TFT | Interpretability, accuracy | Complex configuration |
| XGBoost | Speed, simplicity | No temporal memory |
| Ensemble | Mitigates weaknesses | Complex deployment |
Mobile App Deployment
Inference runs on the server. The model consumes 168 hourly candles (7 days) and returns directional probabilities for 4h/8h/24h horizons. REST endpoint with caching refreshes predictions hourly.
On mobile, only display results:
struct PricePrediction: Codable {
let symbol: String
let horizon4h: PredictionOutcome
let horizon8h: PredictionOutcome
let horizon24h: PredictionOutcome
let updatedAt: Date
}
struct PredictionOutcome: Codable {
let direction: String // "up", "down", "sideways"
let probability: Double // 0.0 - 1.0
let confidenceInterval: ClosedRange<Double> // price range
}
Confidence intervals (quantile regression) show a range rather than point predictions. "BTC in 24h: $55,000–$61,000 with 70% confidence" is more honest than "$57,432."
Model Degradation Monitoring
Cryptocurrency markets evolve: bull/bear regimes shift, new assets emerge, regulatory events occur. A model trained during the 2021 bull market performs poorly in 2022.
Monitoring metrics:
- Rolling accuracy over the last 30 days
- Distribution shift in features (KL-divergence between training and recent data)
- Sharpe ratio if used for trading
When degradation occurs (accuracy drops 5%+ from baseline), automatically retrain on fresh data.
Disclaimer
Include in the app: "Predictions are informational only. Past accuracy does not guarantee future results. Not investment advice."
Process Overview
Collect and clean data (OHLCV + on-chain). Engineer features and normalize. Train and validate multiple models. Select the best, convert, and deploy the API. Mobile UI: display predictions with confidence intervals. Configure monitoring and auto-retraining.
Timeline Estimates
Basic LSTM with standard features and mobile dashboard: 2–4 weeks. Ensemble with on-chain data, multi-horizon forecasting, and monitoring: 5–10 weeks.







