Developing a Disaster Recovery Plan for 1C-Bitrix
The site goes down on a Friday evening. Nobody knows what to do: there's a backup, but it's unclear where to restore it, in what order, or who is responsible for what. After two hours of panic, something resembling a working state is recovered — but six hours of data are lost, and three more days are spent dealing with the fallout. A Disaster Recovery Plan (DRP) is not a bureaucratic document; it is a step-by-step runbook with specific commands, verified against real-world failures.
What a DRP for Bitrix Must Contain
A good plan does not describe theory — it describes specific actions by specific people. Minimum contents:
- Roles matrix: who does what during an incident (DevOps, developer, manager, support team)
- RTO (Recovery Time Objective) and RPO (Recovery Point Objective) — agreed with the business: "restore within 2 hours, data loss no more than 1 hour"
- Contacts: hosting provider, 1C partner, primary developer, backup developer
- Backup scheme with storage locations and access methods
- Step-by-step recovery scenarios for each failure type
Failure Types and Scenarios
Scenario 1: Web server crash (nginx/apache)
# Diagnostics
systemctl status nginx
journalctl -xe -u nginx --since "10 minutes ago"
nginx -t # Config check
# Quick config rollback
cp /etc/nginx/nginx.conf.bak /etc/nginx/nginx.conf
systemctl restart nginx
Scenario 2: Bitrix filesystem corruption
# Stop php-fpm to prevent overwriting files being restored
systemctl stop php8.1-fpm
# Restore from backup (rsync from backup server)
rsync -az --delete backup-server:/backups/bitrix/latest/ /var/www/bitrix/
# Restore permissions
chown -R www-data:www-data /var/www/bitrix/
find /var/www/bitrix/ -type d -exec chmod 755 {} \;
find /var/www/bitrix/ -type f -exec chmod 644 {} \;
systemctl start php8.1-fpm
Scenario 3: Database corruption or loss
The most critical scenario for an online store. Tables b_sale_order, b_sale_basket, b_catalog_price — data that cannot be lost.
# Restore from mysqldump
mysql -u root -p < /backups/db/bitrix_$(date +%Y%m%d).sql
# If the dump is partial — restore individual tables
mysql -u root -p bitrix_db < /backups/db/b_sale_order_$(date +%Y%m%d).sql
mysql -u root -p bitrix_db < /backups/db/b_catalog_price_$(date +%Y%m%d).sql
When using MySQL replication — switching to the replica:
# On the replica
STOP SLAVE;
RESET SLAVE ALL;
# Update dbconn.php to the replica's IP
# The replica becomes the master
Scenario 4: Breach and infection
Restore from a backup taken before the breach — but first you need to determine when it occurred. Bitrix writes logs to /bitrix/php_interface/error.log and to the b_event_log table. Analyze the nginx access log:
# Find the first signs of anomaly
grep -E "(POST|eval|base64_decode|system\()" /var/log/nginx/access.log | \
awk '{print $1}' | sort | uniq -c | sort -rn | head -20
# After restoration — change all passwords
# /bitrix/.settings.php — DB password
# /bitrix/php_interface/dbconn.php
# Passwords of all administrators via b_user
Backup Structure for DRP
The standard scheme we build into the plan:
| Object | Frequency | Retention | Method |
|---|---|---|---|
| DB (full dump) | Every 4 hours | 7 days | mysqldump + S3/Backblaze |
| DB (binlog) | Continuous | 48 hours | MySQL binlog → remote |
Files /upload |
Once daily | 14 days | rsync → backup server |
Files /bitrix |
Once weekly | 4 weeks | tar.gz → S3 |
Configs (/etc) |
On change | 90 days | Git + backup |
A DB backup every 4 hours with an RPO of 1 hour is insufficient. In this case, add continuous binlog replication: it allows restoring the state at any point in time using mysqlbinlog.
1C-Bitrix Specifics: What Is Lost in a Standard Backup
The standard "Backup" tool in the admin panel (Settings → Tools → Backup) creates a site archive. But it does not include:
- Bitrix cache (
/bitrix/cache/,/bitrix/managed_cache/) — not needed during restoration, must be rebuilt - Temporary session files — must be cleared after restoration:
\Bitrix\Main\Application::getInstance()->getSession()->destroy() - Data from external services (1C, CRM) — a separate resynchronization procedure is required
- SSL certificates — stored separately from the site files
Mandatory checklist after restoring from backup:
- Clear the cache:
BXClearCache(true)or viabitrix/admin/cache.php - Rebuild the catalog faceted index
- Check 1C-Bitrix agents (
/bitrix/admin/agent_list.php) - Check cron jobs
- Run a test order in the store
RTO by Failure Type
| Failure type | Realistic RTO | What needs to be prepared |
|---|---|---|
| nginx/php-fpm restart | 5 minutes | Monitoring + Runbook with commands |
| File rollback after breach | 30–60 minutes | File backup + cache reset checklist |
| DB restore from dump | 1–3 hours | Dump + tested procedure |
| Switch to DB replica | 15–30 minutes | Replica + dbconn switch script |
| Full restore to new server | 4–8 hours | Ansible Playbook + server image |
An RTO of "4 hours" without testing is an optimistic estimate. The real RTO is established after the first drill: running a restore in a test environment with timing measurements.
Plan Testing
A DRP that has never been tested does not work. Conduct a drill at least once per quarter:
- Restore the latest backup to a test environment
- Time each step
- Verify data correctness (orders, products, users)
- Record discrepancies from the plan and update the document
Development Timeline
| Phase | Content | Duration |
|---|---|---|
| Infrastructure audit | Backup scheme, failure points, roles | 1–2 days |
| Scenario and Runbook development | 4–6 scenarios with commands | 2–3 days |
| Backup infrastructure setup | S3, replication, monitoring | 2–5 days |
| Test drill + adjustments | Restore on test stand | 1–2 days |
| Documentation and team handover | Final DRP + training | 1 day |







