ML/AI Trading Strategy Bot Development

We design and develop full-cycle blockchain solutions: from smart contract architecture to launching DeFi protocols, NFT marketplaces and crypto exchanges. Security audits, tokenomics, integration with existing infrastructure.
Showing 1 of 1 servicesAll 1306 services
ML/AI Trading Strategy Bot Development
Complex
from 2 weeks to 3 months
FAQ
Blockchain Development Services
Blockchain Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1214
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    823

ML/AI Trading Strategy Bot Development

ML in trading is not a magic profit button. It's a statistical tool for finding patterns in data that have predictive power. Most attempts to apply ML to trading fail due to overfitting, lookahead bias, or ignoring transaction costs. Here's how to do it right.

Why ML in Trading is Harder Than It Seems

Fundamental Problems

Non-stationarity: markets change. A pattern that worked in 2020 may not work in 2024. Model trains on the past, applies to the future — which has a different distribution.

Low signal-to-noise ratio: financial data has extremely low signal/noise ratio. Most patterns found by model are noise that happened to be "significant" in the training sample by chance.

Lookahead bias: if feature engineering accidentally used future data — model learns information not available in reality. Backtest fantastic, live trading — loss.

Overfitting: model with 100 parameters and 500 trades in history is almost certainly overfitted.

Right Approach

  1. Clear hypothesis about what model predicts and why it works
  2. Correct data split train/validation/test without lookahead
  3. Simple models as baseline before complex
  4. Transaction costs included in backtest
  5. Walk-forward validation

Feature Engineering

Types of Features for Crypto Trading

import pandas as pd
import numpy as np
from ta import trend, momentum, volatility

class FeatureEngineer:
    def generate_features(self, df: pd.DataFrame) -> pd.DataFrame:
        """df contains: open, high, low, close, volume"""

        features = pd.DataFrame(index=df.index)

        # === Technical indicators ===
        # Trend
        features['ema_9'] = trend.EMAIndicator(df.close, 9).ema_indicator()
        features['ema_21'] = trend.EMAIndicator(df.close, 21).ema_indicator()
        features['macd'] = trend.MACD(df.close).macd()
        features['macd_signal'] = trend.MACD(df.close).macd_signal()
        features['adx'] = trend.ADXIndicator(df.high, df.low, df.close).adx()

        # Momentum
        features['rsi_14'] = momentum.RSIIndicator(df.close, 14).rsi()
        features['stoch_k'] = momentum.StochasticOscillator(df.high, df.low, df.close).stoch()
        features['cci'] = momentum.CCIIndicator(df.high, df.low, df.close).cci()

        # Volatility
        features['atr'] = volatility.AverageTrueRange(df.high, df.low, df.close).average_true_range()
        features['bb_width'] = (
            volatility.BollingerBands(df.close).bollinger_hband() -
            volatility.BollingerBands(df.close).bollinger_lband()
        ) / df.close

        # === Price-derived features ===
        # Returns on different horizons
        for period in [1, 3, 6, 12, 24]:
            features[f'return_{period}h'] = df.close.pct_change(period)

        # Distance from moving averages (normalized)
        for period in [20, 50, 200]:
            ma = df.close.rolling(period).mean()
            features[f'dist_ma_{period}'] = (df.close - ma) / ma

        # === Volume features ===
        features['volume_ratio'] = df.volume / df.volume.rolling(20).mean()
        features['obv'] = (np.sign(df.close.diff()) * df.volume).cumsum()
        features['obv_ratio'] = features['obv'] / features['obv'].rolling(20).mean()

        # === Market microstructure ===
        features['high_low_range'] = (df.high - df.low) / df.close
        features['close_position'] = (df.close - df.low) / (df.high - df.low + 1e-10)

        return features.dropna()

Critically important: all indicators that "look forward" in time must be shifted back 1 step:

# Wrong: use current candle close to generate signal for same candle
signal = rsi > 70

# Right: signal uses previous candle data
signal = rsi.shift(1) > 70

Model Selection

Gradient Boosting (XGBoost / LightGBM)

Best baseline for structured data. Trains fast, well-interpretable via feature importance, robust to outliers.

import lightgbm as lgb
from sklearn.model_selection import TimeSeriesSplit

class DirectionPredictor:
    def __init__(self, horizon: int = 4):
        self.horizon = horizon  # predict direction in N candles
        self.model = None
        self.feature_cols = None

    def prepare_target(self, df: pd.DataFrame) -> pd.Series:
        """Target: 1 if price grows X% over horizon periods, else 0"""
        future_return = df.close.shift(-self.horizon) / df.close - 1
        threshold = 0.005  # 0.5%
        return (future_return > threshold).astype(int)

    def train(self, features: pd.DataFrame, prices: pd.DataFrame):
        y = self.prepare_target(prices)

        # Align indices
        common_idx = features.index.intersection(y.dropna().index)
        X = features.loc[common_idx]
        y = y.loc[common_idx]

        # Walk-forward validation: train on first 70%, test on last 30%
        split = int(len(X) * 0.7)
        X_train, X_test = X.iloc[:split], X.iloc[split:]
        y_train, y_test = y.iloc[:split], y.iloc[split:]

        params = {
            'objective': 'binary',
            'metric': 'auc',
            'learning_rate': 0.05,
            'num_leaves': 31,
            'min_data_in_leaf': 50,
            'feature_fraction': 0.8,
            'bagging_fraction': 0.8,
            'bagging_freq': 5,
            'verbose': -1
        }

        train_data = lgb.Dataset(X_train, label=y_train)
        val_data = lgb.Dataset(X_test, label=y_test)

        self.model = lgb.train(
            params,
            train_data,
            valid_sets=[val_data],
            num_boost_round=500,
            callbacks=[lgb.early_stopping(50), lgb.log_evaluation(100)]
        )
        self.feature_cols = X.columns.tolist()

    def predict_proba(self, features: pd.DataFrame) -> float:
        X = features[self.feature_cols].iloc[-1:]
        return float(self.model.predict(X)[0])

LSTM for Sequence Modeling

If hypothesis is that sequence matters (not just indicator value but its movement over N periods), LSTM can be useful:

import torch
import torch.nn as nn

class PriceLSTM(nn.Module):
    def __init__(self, input_size: int, hidden_size: int = 64, num_layers: int = 2):
        super().__init__()
        self.lstm = nn.LSTM(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True,
            dropout=0.2
        )
        self.classifier = nn.Sequential(
            nn.Linear(hidden_size, 32),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(32, 1),
            nn.Sigmoid()
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        # x: (batch, sequence_len, features)
        lstm_out, _ = self.lstm(x)
        last_output = lstm_out[:, -1, :]  # take last timestep
        return self.classifier(last_output)

In practice, LSTM rarely outperforms LightGBM on daily and hourly data. On tick data or with order sequences — can be more effective.

Walk-forward Validation

Standard train/test split is invalid for time series: model trains on data following tests — this is lookahead bias.

def walk_forward_backtest(
    model_class,
    features: pd.DataFrame,
    prices: pd.DataFrame,
    train_window: int = 365,  # days of training
    test_window: int = 30,    # days of testing
    step: int = 30            # window slide step
) -> pd.DataFrame:
    results = []
    n = len(features)

    for start in range(0, n - train_window - test_window, step):
        train_end = start + train_window
        test_end = train_end + test_window

        X_train = features.iloc[start:train_end]
        X_test = features.iloc[train_end:test_end]
        p_train = prices.iloc[start:train_end]
        p_test = prices.iloc[train_end:test_end]

        # Train model on fresh data
        model = model_class()
        model.train(X_train, p_train)

        # Test on next period
        predictions = [model.predict_proba(X_test.iloc[:i+1]) for i in range(len(X_test))]
        period_results = simulate_trading(predictions, p_test)
        results.append(period_results)

    return pd.concat(results)

Walk-forward validation gives realistic performance estimate: model never sees test data before actual "real" application.

Integration into Trading Bot

class MLTradingBot:
    def __init__(self, model: DirectionPredictor, threshold: float = 0.65):
        self.model = model
        self.threshold = threshold  # minimum probability for entry

    async def on_candle(self, candle: Candle):
        features = self.feature_eng.update(candle)

        prob_up = self.model.predict_proba(features)

        if prob_up > self.threshold and not self.has_position():
            await self.open_long()
        elif prob_up < (1 - self.threshold) and not self.has_position():
            await self.open_short()
        elif self.has_position():
            # Exit if model became less confident
            current_side = self.position.side
            if current_side == 'long' and prob_up < 0.5:
                await self.close_position("model_signal_weak")

Important: threshold 0.65 means "enter only if model is 65%+ confident of rise". This reduces trade count but improves quality. Optimal threshold determined on validation data.

Key Mistakes

Mistake Why Dangerous Solution
Lookahead bias in features Unrealistic backtest Always shift by 1 period
No transaction costs Losing live Include 0.1-0.2% per trade
Normal train/test split Lookahead at data level Walk-forward only
Too many features Overfitting guaranteed Feature selection, L1 regularization
No model retraining Degradation over time Retrain every 30-90 days

ML bot is not "set and forget". Markets drift, models degrade. Requires monitoring model metrics live and periodic retraining.