Setting up automatic website restore from backup
Website restore from backup must be documented and tested before incident. Panic during emergency + unfamiliar restore process = hours of downtime. Goal: RTO (Recovery Time Objective) no more than 1 hour.
PostgreSQL restore script
#!/bin/bash
BACKUP_SOURCE="${1:-latest}"
TARGET_DB="${2:-myapp_restore}"
S3_BUCKET="s3://myapp-backups/postgresql"
if [ "$BACKUP_SOURCE" = "latest" ]; then
BACKUP_FILE=$(aws s3 ls "${S3_BUCKET}/" | sort | tail -1 | awk '{print $4}')
aws s3 cp "${S3_BUCKET}/${BACKUP_FILE}" "/tmp/${BACKUP_FILE}"
LOCAL_FILE="/tmp/${BACKUP_FILE}"
fi
psql -U postgres -c "CREATE DATABASE ${TARGET_DB};" 2>/dev/null || true
gunzip -c "$LOCAL_FILE" | psql -U postgres -d "$TARGET_DB" -v ON_ERROR_STOP=1
TABLES=$(psql -U postgres -d "$TARGET_DB}" -t -c "SELECT COUNT(*) FROM information_schema.tables;")
echo "Restore completed. Tables: $TABLES"
rm -f "$LOCAL_FILE"
Full site restore
#!/bin/bash
DOMAIN="example.com"
APP_DIR="/var/www/myapp"
GIT_REPO="[email protected]:company/myapp.git"
GIT_TAG="${1:-main}"
# 1. Maintenance page
cat > /var/www/maintenance/index.html << 'EOF'
<h1>Technical maintenance</h1>
<p>Site unavailable. Recovery up to 60 minutes.</p>
EOF
# 2. Restore code from git
git clone --branch "$GIT_TAG" "$GIT_REPO" "$APP_DIR"
cd "$APP_DIR"
composer install --no-dev --optimize-autoloader
# 3. Restore database
/usr/local/bin/restore-db.sh latest myapp
# 4. Restore files
aws s3 sync s3://myapp-backups/files/uploads/ "${APP_DIR}/storage/app/uploads/"
# 5. Permissions and cache
chown -R www-data:www-data "$APP_DIR/storage"
php artisan config:cache
php artisan route:cache
php artisan migrate --force
# 6. Remove maintenance
nginx -s reload
# Verify
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "https://${DOMAIN}/health")
if [ "$HTTP_CODE" = "200" ]; then
echo "Recovery SUCCESSFUL"
fi
Runbook for on-call engineer
# Recovery Runbook: restore myapp after incident
## Step 1: Diagnostics (5 min)
- ssh web01.example.com
- systemctl status nginx php8.3-fpm postgresql
- tail -100 /var/log/nginx/error.log
## Step 2: Notification
- Update status page
- Slack #incidents message
## Step 3: Recovery
- Full: sudo /usr/local/bin/restore-site.sh v1.2.3
- DB only: sudo /usr/local/bin/restore-db.sh latest
- Files only: aws s3 sync s3://myapp-backups/files/ /var/www/myapp/storage/
## Step 4: Verification
- https://example.com/ → HTTP 200
- Login → success
- Critical functions from /docs/smoke-tests.md
## Step 5: Post-mortem
- Fill incident template in Confluence
- Add prevention to backlog
Monthly recovery drill
0 8 1 * * /usr/local/bin/restore-site.sh latest >> /var/log/dr-drill.log 2>&1 && \
curl -fsS https://hc-ping.com/dr-drill-uuid > /dev/null
Implementation Timeline
Restore scripts with runbook: 2–3 days. Automated restore testing with monthly drill: 3–4 days.







