Setting up RTO/RPO for a 1C-Bitrix project

Our company is engaged in the development, support and maintenance of Bitrix and Bitrix24 solutions of any complexity. From simple one-page sites to complex online stores, CRM systems with 1C and telephony integration. The experience of developers is confirmed by certificates from the vendor.
Our competencies:
Development stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_bitrix-bitrix-24-1c_fixper_448_0.png
    Website development for FIXPER company
    815
  • image_bitrix-bitrix-24-1c_development_of_an_online_appointment_booking_widget_for_a_medical_center_594_0.webp
    Development based on Bitrix, Bitrix24, 1C for the company Development of an Online Appointment Booking Widget for a Medical Center
    565
  • image_bitrix-bitrix-24-1c_mirsanbel_458_0.webp
    Development based on 1C Enterprise for MIRSANBEL
    747
  • image_crm_dolbimby_434_0.webp
    Website development on CRM Bitrix24 for DOLBIMBY
    657
  • image_crm_technotorgcomplex_453_0.webp
    Development based on Bitrix24 for the company TECHNOTORGKOMPLEKS
    980

Configuring RTO/RPO for 1С-Bitrix Project

Business says: "the site must not be down for more than an hour". Engineer nods and goes to configure replication. After six months, it turns out that recovery from the latest backup takes 4 hours, and the business didn't know about it. RTO and RPO are not technical characteristics, they are agreements with the business that need to be documented and technically ensured.

What are RTO and RPO in the context of Bitrix

RPO (Recovery Point Objective) — maximum permissible data loss. If RPO = 1 hour, then in case of catastrophe, no more than one hour of transactions can be lost: orders, registrations, inventory changes.

RTO (Recovery Time Objective) — maximum permissible downtime. If RTO = 30 minutes, then 30 minutes after the incident, the site must be operational.

Typical values for an online store on Bitrix: RPO = 1 hour, RTO = 2 hours. For highload projects: RPO = 5 minutes, RTO = 15 minutes. The stricter the requirements — the more expensive the infrastructure.

Technical solutions for different RPO levels

RPO = several hours. Hourly pg_dump to external storage is sufficient. Simple, cheap, but slow to restore for large databases.

RPO = minutes. PostgreSQL streaming replication with synchronous mode (synchronous_commit = on). Each transaction is confirmed only after being written to the replica. Cost: +5–15 ms per transaction.

RPO = seconds. Patroni with synchronous replication + continuous WAL archiving via archive_command to S3. With WAL archiving, you can restore the database to any point in time (PITR — Point-in-Time Recovery).

# postgresql.conf for PITR
archive_mode = on
archive_command = 'aws s3 cp %p s3://backup-bucket/wal/%f'

Technical solutions for different RTO levels

RTO = several hours. Recovery from pg_dump + code deployment from git. Linearly depends on database size: 10 GB — approximately 45–90 minutes recovery.

RTO = 30–60 minutes. Standby server with hot replica. During incident — manual failover: promote replica, change DNS or application config. Not automatic, but fast.

RTO = less than 10 minutes. Automatic failover via Patroni + HAProxy. Without human intervention. Requires preliminary setup and regular testing.

Solution matrix for Bitrix

Project Size RPO RTO Infrastructure
Up to 5k orders/day 1 hour 4 hours pg_dump to S3, deploy from git
5–50k orders/day 15 min 1 hour Streaming replica + manual failover
Over 50k orders/day 1 min 10 min Patroni + HAProxy + WAL archiving

Calculating real RTO: what is included in recovery time

Recovery time is the sum of all steps, not just "restore database":

  1. Incident detection — from 0 to 15 minutes (depends on monitoring)
  2. Failover decision — 5–10 minutes
  3. DB recovery/promotion — depends on RPO solution
  4. Application configuration change — 2–5 minutes
  5. Cache warming — first requests after recovery are slow, Redis/memcached are empty
  6. Health verification — 5–10 minutes

The "cache warming" point is often ignored in RTO calculations. After recovery, the database receives load from scratch: Bitrix cache is empty, OPcache is cold. First 5–10 minutes of operation — peak database load. Without rate limiting, this can overwhelm the newly restored server.

Documentation and testing

RTO/RPO without a documented runbook is worthless. The runbook should contain the exact sequence of commands for each failure scenario: primary DB failure, web server failure, /upload/ loss, server compromise.

# Example runbook section: PostgreSQL failover (manual)
# 1. Verify that primary is unavailable
pg_isready -h primary.db -p 5432

# 2. Promote replica
ssh replica.db 'pg_ctl promote -D /var/lib/postgresql/data'

# 3. Update Bitrix config
sed -i "s/primary.db/replica.db/" /var/www/bitrix/.settings.php

# 4. Clear cache
php /var/www/bitrix/bitrix/modules/main/cli/cache_clear.php

What we configure

  • Determining target RPO and RTO together with the business
  • Selecting and configuring infrastructure solution for the given parameters
  • WAL archiving for PostgreSQL PITR when RPO < 15 minutes
  • Runbook with recovery commands for each failure scenario
  • Testing schedule: quarterly recovery from backup, measure actual RTO