1C-Bitrix Disaster Recovery Monitoring and Testing

Our company is engaged in the development, support and maintenance of Bitrix and Bitrix24 solutions of any complexity. From simple one-page sites to complex online stores, CRM systems with 1C and telephony integration. The experience of developers is confirmed by certificates from the vendor.
Our competencies:
Development stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1173
  • image_bitrix-bitrix-24-1c_fixper_448_0.png
    Website development for FIXPER company
    811
  • image_bitrix-bitrix-24-1c_development_of_an_online_appointment_booking_widget_for_a_medical_center_594_0.webp
    Development based on Bitrix, Bitrix24, 1C for the company Development of an Online Appointment Booking Widget for a Medical Center
    564
  • image_bitrix-bitrix-24-1c_mirsanbel_458_0.webp
    Development based on 1C Enterprise for MIRSANBEL
    745
  • image_crm_dolbimby_434_0.webp
    Website development on CRM Bitrix24 for DOLBIMBY
    655
  • image_crm_technotorgcomplex_453_0.webp
    Development based on Bitrix24 for the company TECHNOTORGKOMPLEKS
    976

Monitoring and Testing Disaster Recovery for 1C-Bitrix

A written recovery plan without regular verification does not work. The team does not know the real RTO, backups may be corrupted, and the production configuration may have changed since the last drill. DR monitoring is not just observing the current state — it is regularly confirming that the recovery plan can be executed within the declared timeframe.

What to Monitor in the Context of DR

Backup State

Monitor not just whether a backup was created, but its integrity:

#!/bin/bash
# Check the latest DB dump
BACKUP_FILE="/backups/db/bitrix_$(date +%Y%m%d).sql.gz"
MIN_SIZE=104857600  # 100 MB — minimum expected size

if [ ! -f "$BACKUP_FILE" ]; then
    echo "CRITICAL: Backup file not found: $BACKUP_FILE"
    exit 2
fi

FILE_SIZE=$(stat -c%s "$BACKUP_FILE")
if [ "$FILE_SIZE" -lt "$MIN_SIZE" ]; then
    echo "CRITICAL: Backup too small: ${FILE_SIZE} bytes"
    exit 2
fi

# Check gzip integrity
if ! gzip -t "$BACKUP_FILE" 2>/dev/null; then
    echo "CRITICAL: Backup file is corrupted"
    exit 2
fi

echo "OK: Backup size ${FILE_SIZE} bytes, integrity OK"

This script runs in Nagios/Zabbix/Prometheus as an external check. An alert fires if the backup is missing, too small, or corrupted.

DB Replication

-- Seconds_Behind_Master > 300 — alert
SHOW SLAVE STATUS\G

In Zabbix — via zabbix_get with a MySQL agent or a custom UserParameter:

# zabbix_agentd.conf
UserParameter=mysql.slave.lag,mysql -u monitor -pXXX -e "SHOW SLAVE STATUS\G" 2>/dev/null | grep "Seconds_Behind_Master" | awk '{print $2}'

Free Space on the Backup Server

# Warning when <20% free space remains
df -h /backups | awk 'NR==2 {gsub(/%/,""); if ($5 > 80) print "WARNING: disk " $5 "% used"}'

Recovery Endpoint Availability

A simple healthcheck on the backup server, monitored from both the primary DC and an external monitoring service:

// /health.php on the backup server
<?php
header('Content-Type: application/json');

$checks = [];

// Check DB availability
try {
    $pdo = new PDO('mysql:host=127.0.0.1;dbname=bitrix_db', 'bitrix_ro', '***');
    $pdo->query("SELECT 1");
    $checks['db'] = 'ok';
} catch (Exception $e) {
    $checks['db'] = 'fail';
}

// Check Redis
$redis = new Redis();
$checks['redis'] = $redis->connect('127.0.0.1', 6379) ? 'ok' : 'fail';

// Check Bitrix filesystem
$checks['files'] = file_exists('/var/www/bitrix/bitrix/php_interface/dbconn.php') ? 'ok' : 'fail';

$status = in_array('fail', $checks) ? 503 : 200;
http_response_code($status);
echo json_encode(['status' => $status === 200 ? 'ok' : 'degraded', 'checks' => $checks]);

Regular DR Drills: Methodology

Quarterly drill — full restore to an isolated test stand:

  1. Take the latest DB and file backup
  2. Deploy to a clean server
  3. Time each stage
  4. After restore — run an automated smoke test
#!/bin/bash
# dr_smoke_test.sh — runs after restore
BASE_URL="https://test-recovery.example.com"

check() {
    local name="$1"
    local url="$2"
    local expected="$3"

    response=$(curl -sf --max-time 30 "$url")
    if echo "$response" | grep -q "$expected"; then
        echo "PASS: $name"
    else
        echo "FAIL: $name — expected '$expected' not found"
        FAILED=1
    fi
}

check "Homepage" "$BASE_URL/" "1C-Bitrix"
check "Catalog" "$BASE_URL/catalog/" "Catalog"
check "Cart API" "$BASE_URL/bitrix/components/bitrix/sale.basket.basket/" "basket"
check "Health endpoint" "$BASE_URL/health.php" '"status":"ok"'

[ -z "$FAILED" ] && echo "All checks passed" || echo "Some checks FAILED"

Monthly drill — DB-only restore. Verify dump currency: restore to a test server, run queries against b_sale_order, b_iblock_element, b_catalog_price — confirm that the data is current (latest records not older than the RPO).

-- Check data freshness after restore
SELECT MAX(DATE_INSERT) as latest_order FROM b_sale_order;
-- Should not be older than RPO (e.g., not older than 4 hours)

SELECT COUNT(*) FROM b_iblock_element WHERE ACTIVE = 'Y';
-- Compare with the expected number of active products

DR Metrics and SLA

Metric Target value How it is measured
DB backup: age of last valid backup < RPO (e.g. 4 h) Monitoring + file timestamp
Replication: Seconds_Behind_Master < 60 s under normal conditions Zabbix/Prometheus
Drill duration (full restore) Compared against RTO Timed at each drill
Successful drills per quarter ≥ 1 Testing log
File backup age < 24 h rsync monitoring

DR Reporting

After each drill, record:

  • Date and time of drill
  • Plan version (revision number)
  • Time for each recovery stage
  • Actual RTO vs planned RTO
  • Issues discovered during the drill
  • Plan updates following the drill

This log is not a formality. It reveals trends: whether RTO is degrading over time (the site grows, backups become larger, the procedure is not updated).

DR Monitoring Setup Timeline

Setting up backup monitoring, replication checks, and healthcheck endpoints with integration into Zabbix/Prometheus, plus the first drill with an automated smoke test — 3–5 business days.