Automatic Trading Bot Restart Setup
Trading bot must work 24/7. Process crashes — due to network error, OOM, unhandled exception or system reboot. Without auto-restart, bot lies dead until someone notices. Proper setup — not just "restart always", also health checks, graceful shutdown and alerts.
systemd: Production standard on Linux
systemd — right tool for managing long-lived services on Linux. Supports auto-restart, service dependencies, resource limits, log rotation.
# /etc/systemd/system/trading-bot.service
[Unit]
Description=Trading Bot
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=botuser
WorkingDirectory=/opt/trading-bot
ExecStart=/opt/trading-bot/venv/bin/python -u bot.py
Restart=always
# Wait 10 seconds before restart (don't spam on quick crashes)
RestartSec=10
# If bot crashes 5+ times in 60 sec — stop (something seriously wrong)
StartLimitIntervalSec=60
StartLimitBurst=5
# Environment variables
EnvironmentFile=/opt/trading-bot/.env
# Resource limits
MemoryLimit=2G
CPUQuota=80%
# Logging via journald
StandardOutput=journal
StandardError=journal
SyslogIdentifier=trading-bot
[Install]
WantedBy=multi-user.target
Activation and management:
systemctl daemon-reload
systemctl enable trading-bot # auto-start on system boot
systemctl start trading-bot
systemctl status trading-bot
journalctl -u trading-bot -f # live logs
StartLimitBurst=5 + StartLimitIntervalSec=60 — critical protection. Without it, bot in crash loop will restart forever, accumulating errors (open positions, duplicate orders). After 5 quick crashes, systemd stops service and alerts (if configured).
Graceful shutdown: Proper termination
Can't kill bot with SIGKILL — leaves open orders, unfixed positions, unsent alerts. Handle SIGTERM:
import signal
import asyncio
class TradingBot:
def __init__(self):
self.running = True
self.open_orders: list = []
async def shutdown(self):
self.running = False
# Cancel all open limit orders
for order_id in self.open_orders:
try:
await self.exchange.cancel_order(order_id)
except Exception as e:
logger.error(f"Failed to cancel order {order_id}: {e}")
logger.info("Graceful shutdown complete")
async def run(self):
loop = asyncio.get_event_loop()
loop.add_signal_handler(
signal.SIGTERM,
lambda: asyncio.create_task(self.shutdown())
)
while self.running:
try:
await self.main_loop()
except Exception as e:
logger.exception(f"Error in main loop: {e}")
await asyncio.sleep(5) # backoff before next iteration
systemd sends SIGTERM on systemctl stop, then after TimeoutStopSec (default 90s) — SIGKILL. For bot with positions 90 seconds usually suffices.
Docker: Alternative for containerized bots
If bot runs in Docker:
# docker-compose.yml
services:
trading-bot:
image: trading-bot:latest
restart: unless-stopped # restart always, except explicit stop
env_file: .env
volumes:
- ./data:/app/data # persistent state storage
- ./logs:/app/logs
mem_limit: 2g
logging:
driver: "json-file"
options:
max-size: "100m"
max-file: "5"
healthcheck:
test: ["CMD", "python", "-c", "import requests; requests.get('http://localhost:8080/health', timeout=5)"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
restart: unless-stopped — restarts on crash and reboot, but not after manual stop.
Health check — critical. Docker restarts only on full crash. If bot hangs (hanging without error) — Docker won't notice. Health check must verify real state, not just "process alive":
from aiohttp import web
async def health_check(request):
# Check last cycle wasn't > 5 minutes ago
last_loop_age = time.time() - bot.last_loop_time
if last_loop_age > 300: # 5 minutes
return web.Response(status=503, text=f"Bot stuck: last loop {last_loop_age:.0f}s ago")
# Check exchange connection
if not bot.exchange_connected:
return web.Response(status=503, text="Exchange disconnected")
return web.Response(status=200, text="OK")
app = web.Application()
app.router.add_get('/health', health_check)
Alerts on crash
Restart itself should trigger notification — even if bot auto-recovered.
systemd → Telegram:
# /etc/systemd/system/trading-bot-notify.service
[Unit]
Description=Notify on trading bot failure
[Service]
Type=oneshot
ExecStart=/opt/scripts/notify-telegram.sh "Trading bot restarted on $(hostname)"
# In trading-bot.service add:
OnFailure=trading-bot-notify.service
Prometheus Alertmanager for complex scenarios:
# alert rule
- alert: TradingBotRestarted
expr: changes(process_start_time_seconds{job="trading-bot"}[5m]) > 0
annotations:
summary: "Trading bot restarted {{ $value }} times in last 5 minutes"
Reasonable minimum: Telegram notification on each (re)start with hostname and timestamp. Enough for 95% of cases.







