Configuration of Disaster Recovery for 1C-Bitrix
Server died at 2:30 AM. Database was last backed up at 11:00 PM. Site is unavailable. When everything comes up — unknown: nobody checked if backup actually works. This is typical situation on projects where DR exists "on paper" but was never tested.
Components that need recovery
Bitrix project consists of several independent layers, each requires separate backup strategy:
-
Application code —
/var/www/bitrix/(core) and/local/(customizations). Code stored in git — should be standard, not exception. - Database — PostgreSQL or MySQL. For Bitrix under load — primary/replica scheme, replica snapshots.
-
Uploaded files —
/upload/,/bitrix/backup/. Volume grows continuously, often ignored when setting up backups. -
Configuration files —
/bitrix/.settings.php,/bitrix/php_interface/dbconn.php, nginx/php-fpm configs.
Built-in backup mechanism
Bitrix has built-in backup tool (/bitrix/admin/backup.php). Creates archives in /bitrix/backup/ via agent CBackupAgent. Parameters stored in b_option, module main:
-
backup_auto— enable automatic backup -
backup_period— period in hours -
backup_keep_count— number of copies to keep
Built-in backup works, but has limitations: on large projects (database > 5 GB, /upload/ > 20 GB) it times out, takes space on same server, and doesn't provide external replication out of box.
Strategy: protection levels
Level 1 — DB in real-time. PostgreSQL streaming replication or MySQL GTID-replication. Replica receives WAL/binlog and lags by seconds. On primary failure — manual or automatic failover to replica. Configuration in postgresql.conf:
wal_level = replica
max_wal_senders = 3
wal_keep_size = 1GB
Level 2 — hourly DB snapshots. pg_dump or xtrabackup via cron, result to external storage (S3, rsync to offsite server). For PostgreSQL pg_basebackup for physical backup preferred — faster recovery.
Level 3 — file backups. /upload/ grows linearly, full backup daily is impractical. Incremental rsync or Restic:
restic -r s3:s3.amazonaws.com/bucket/upload \
backup /var/www/site/upload \
--exclude /var/www/site/upload/resize_cache
resize_cache excluded — it recovers automatically on image access.
RTO/RPO for typical project
- RPO (acceptable data loss): with streaming replication — seconds. With hourly snapshots — up to 1 hour.
- RTO (recovery time): depends on database size. PostgreSQL with WAL PITR recovers 10 GB database in 15–30 minutes. Application deployment from git + config recovery — 5–10 minutes.
Testing DR — mandatory step
DR without regular testing — false confidence. Quarterly: take latest backup, raise on isolated stand, check:
# Verify database dump integrity
pg_restore --list /backup/site.dump | tail -20
# Verify site raises from backup
# Test: place order, login to admin
Record actual recovery time. If exceeds stated RTO — optimize procedure.
What to configure
- PostgreSQL/MySQL streaming replication with replica lag monitoring
- Hourly
pg_dumporpg_basebackupto external storage - Incremental
/upload/backup via Restic or rsync excludingresize_cache - Recovery script with documented action order
- Quarterly testing schedule with real RTO measurement
- Alerts on backup failure (file absent for last X hours)







