Setting up automatic failover for 1C-Bitrix

Our company is engaged in the development, support and maintenance of Bitrix and Bitrix24 solutions of any complexity. From simple one-page sites to complex online stores, CRM systems with 1C and telephony integration. The experience of developers is confirmed by certificates from the vendor.
Our competencies:
Development stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_bitrix-bitrix-24-1c_fixper_448_0.png
    Website development for FIXPER company
    815
  • image_bitrix-bitrix-24-1c_development_of_an_online_appointment_booking_widget_for_a_medical_center_594_0.webp
    Development based on Bitrix, Bitrix24, 1C for the company Development of an Online Appointment Booking Widget for a Medical Center
    565
  • image_bitrix-bitrix-24-1c_mirsanbel_458_0.webp
    Development based on 1C Enterprise for MIRSANBEL
    747
  • image_crm_dolbimby_434_0.webp
    Website development on CRM Bitrix24 for DOLBIMBY
    657
  • image_crm_technotorgcomplex_453_0.webp
    Development based on Bitrix24 for the company TECHNOTORGKOMPLEKS
    980

Configuration of Automatic Failover for 1C-Bitrix

Primary database server went down at 3 AM. On-call engineer unreachable. Without automatic failover site will be down until morning. With properly configured failover — in 30–60 seconds traffic switches to replica, users notice nothing.

Components of automatic failover

Automatic failover for Bitrix consists of three independent layers:

1. Database failover — switching from primary to replica on primary failure. 2. Web server failover — load balancer removes unavailable node from rotation. 3. Bitrix config update — switching connection string to new master.

All three must work in concert, otherwise DB failover without config update causes connection errors.

PostgreSQL failover via Patroni

Patroni is de-facto standard for PostgreSQL automatic failover. Architecture: Patroni agent on each node, etcd/Consul as DCS (distributed configuration store), HAProxy or pgBouncer in front of cluster.

Patroni monitors node status and on primary failure conducts elections for new leader via DCS. Replica with minimum lag (smallest LSN lag) becomes new primary. Whole process takes 10–30 seconds.

Critical for Bitrix: application connects to DB not directly to server IP, but through HAProxy or virtual IP (VIP) managed by Patroni:

# /bitrix/.settings.php — connection via HAProxy
'dsn' => 'pgsql:host=haproxy.internal;port=5432;dbname=bitrix',

HAProxy checks Patroni REST API (http://patroni-node:8008/master) and directs traffic only to current primary.

MySQL failover via Orchestrator

For MySQL Bitrix installations, Orchestrator is Patroni equivalent. It tracks replication topology, detects master failure and automatically promotes most current replica.

After promotion Orchestrator calls hook-script, which updates DNS or notify-script for HAProxy.

Bitrix configuration for replica operation

On failover new primary is former read-replica. Before failover Bitrix could be configured for read/write splitting:

// /bitrix/.settings.php
'connections' => [
    'default' => [
        'host' => 'primary.db',
        'port' => '5432',
        // ... write-connection
    ],
    'replica' => [
        'host' => 'replica.db',
        'port' => '5432',
        'readonly' => true,
        // ... read-connection
    ],
],

After failover replica became primary — replica string should no longer be used for read-only connection (it now accepts writes). HAProxy with Patroni API check solves this automatically: both ports (write 5432, read 5433) checked separately.

Checking after failover

After switching Bitrix may hold stale data in cache if cache was bound to old primary. For memcached/Redis — no problems. For file cache: invalidate via BXClearCache(true) or via admin panel.

Another problem — uncommitted transactions at moment of primary crash. WAL-replication guarantees application of all written transactions on replica, but transactions in primary memory at crash moment are lost. This is normal behavior of synchronous/asynchronous replication with second-range losses.

State monitoring

# Patroni — current leader
curl http://patroni-node1:8008/cluster | jq '.members[] | {name, role, lag}'

# Replication lag (PostgreSQL)
SELECT
    client_addr,
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes
FROM pg_stat_replication;

Alert: if lag_bytes > 50MB — replication can't keep up, data loss risk on failover increases.

What to configure

  • Patroni (PostgreSQL) or Orchestrator (MySQL) on DB cluster
  • HAProxy with health-check via Patroni REST API
  • Bitrix connection via HAProxy, not directly to DB IP
  • Post-failover hook-script for cache invalidation and notification
  • Replication LAG monitoring with alert on threshold exceed
  • Regular failover testing on load stand