1C-Bitrix Clustering and Scaling Services

Our company is engaged in the development, support and maintenance of Bitrix and Bitrix24 solutions of any complexity. From simple one-page sites to complex online stores, CRM systems with 1C and telephony integration. The experience of developers is confirmed by certificates from the vendor.
Frequently Asked Questions

Our competencies:

Development stages

Latest works

  • image_website-b2b-advance_0.webp
    B2B ADVANCE company website development
    1298
  • image_bitrix-bitrix-24-1c_fixper_448_0.webp
    Website development for FIXPER company
    889
  • image_bitrix-bitrix-24-1c_development_of_an_online_appointment_booking_widget_for_a_medical_center_594_0.webp
    Development based on Bitrix, Bitrix24, 1C for the company Development of an Online Appointment Booking Widget for a Medical Center
    638
  • image_bitrix-bitrix-24-1c_mirsanbel_458_0.webp
    Development based on 1C Enterprise for MIRSANBEL
    788
  • image_crm_dolbimby_434_0.webp
    Website development on CRM Bitrix24 for DOLBIMBY
    689
  • image_crm_technotorgcomplex_453_0.webp
    Development based on Bitrix24 for the company TECHNOTORGKOMPLEKS
    1021

1C-Bitrix Clustering

Imagine: a flash sale, 10,000 users simultaneously on the site, the server goes down with a 502 error, carts disappear, managers call support. We have seen this dozens of times. The solution is clustering: load balancing between servers, database replication, and automatic failover. Order an audit of your current infrastructure — in 2 days we will determine if and what kind of cluster is needed. Our experience: 40+ high-load projects on Bitrix.

Why is 1C-Bitrix clustering critical for fault tolerance?

80-90% of requests in a typical project are SELECT. Catalog, product pages, filters — all reads. Master-slave replication routes SELECTs to slave servers, leaving the master for writes only. The 'Web Cluster' module (Business edition and higher) routes requests automatically.

Common stumbling blocks: on master binlog_format = ROW. STATEMENT-based replication with NOW() or UUID() causes inconsistencies — leading to a week of debugging. Unique server-id, binary log enabled. On slave — read_only = ON, relay-log. Initialization via xtrabackup (not mysqldump, which locks tables for half an hour on a 20 GB database).

Metric #1 — Seconds_Behind_Master. If a slave lags by 5+ seconds, a customer places an order, returns to their personal account — and the order is missing (SELECT went to a lagging slave). The module allows manual exclusion of critical queries from slave routing.

Failover: Orchestrator or ProxySQL promote a slave to master in 15-30 seconds. The module supports up to 9 slave connections with configurable weights. Integrity check — pt-table-checksum from Percona Toolkit. Savings from inefficient infrastructure can be up to 40% of the budget, representing a significant annual amount for projects with 50,000+ unique visitors. For more information on replication, refer to MySQL Replication Documentation and Wikipedia: Database Replication.

When is clustering necessary?

Not every project needs it. Specific markers:

  • 50,000-100,000 unique visitors per day — a single server starts returning 502 errors during peak hours
  • Peak spikes of 5-10 times (sales, flash sales) — load grows in minutes, vertical scaling is not enough
  • SLA 99.9% (no more than 8.7 hours of downtime per year) — unattainable with a single server
  • Geographic distribution of users

Sometimes composite caching, SQL optimization, and vertical scaling are sufficient. We will honestly tell you if a cluster is not yet needed. Investments in clustering typically pay off within 3-6 months under peak loads. The average project budget is determined individually.

What does the cluster architecture consist of?

Load balancer. HAProxy, nginx upstream, or cloud LB. Round-robin for even distribution, ip-hash for session stickiness, least connections for adaptive balancing. Health checks remove dead servers from the pool. SSL termination on the balancer offloads web nodes.

Web servers. Identical nginx + php-fpm, each with a full copy of the code. Sessions in Redis/Memcached, not on disk (otherwise users lose their cart when switching servers). In the cloud — auto-scaling: load increases — servers are added, load decreases — they are removed.

Cache. Redis Cluster with data sharding across nodes. Redis Sentinel for small clusters. Memcached is fast but lacks persistence. Configuration in .settings.php — servers, weights, sharding strategy.

File storage. Uploads, images — accessible from each node. NFS for 2-3 servers, but it is a single point of failure. GlusterFS — distributed file system without single point of failure. S3 (MinIO, AWS, Yandex Object Storage) — offload static files to object storage, the Bitrix module works out of the box.

How to ensure failover at each cluster level?

Level Mechanism RTO
Load balancer Keepalived + VRRP < 5 sec
Web servers Health check < 10 sec
MySQL master Orchestrator / ProxySQL < 30 sec
MySQL slave Removal from pool < 5 sec
Redis Sentinel / Cluster failover < 15 sec
Files GlusterFS replication Automatic

The cluster is 5 times more reliable than a single server — if any node fails, the service continues to operate.

What are common clustering setup mistakes?

  • Sessions on files — when a server goes down, users lose cart and authentication.
  • Unmonitored Seconds_Behind_Master — sales suffer and SLA is unmet.
  • Single point of failure at the file storage level (NFS without replication).
  • Lack of replication monitoring — data inconsistencies go undetected.

We include checks for all these points in our audit and testing.

What is the clustering process?

  1. Load audit — load profile, bottlenecks, load testing. We find the ceiling of a single server.
  2. Design — components tailored to requirements and budget. Not everyone needs GlusterFS — sometimes NFS and backups suffice.
  3. Infrastructure — servers, network, firewalls. Ansible for automation — any node can be recreated in minutes.
  4. Migration — transfer with minimal downtime. Components are connected sequentially, each step verified.
  5. Testing — simulation of peak conditions. We crash the master, disconnect a web server, kill Redis — see how the system behaves.
  6. Documentation — architecture diagram, runbook, disaster recovery plans.

What does clustering work include?

Deliverable Description
Current load audit Request profile, bottlenecks, load testing
Project documentation Architecture diagram, runbook, disaster recovery plan
Infrastructure Server, network, firewall setup (Ansible)
Migration Transfer with minimal downtime, phased component connection
Testing Simulation of peak conditions: crash master, disconnect web server, kill Redis
Team training Documentation, 2 weeks of post-implementation consultations
Warranty 6 months of correct cluster operation — if something goes wrong, we fix it within 24 hours

What are the typical timelines?

Task Timeline
Audit and design 1-2 weeks
Basic cluster (2 web + master-slave MySQL) 2-3 weeks
Full cluster with failover at all levels 4-6 weeks
Monitoring + load testing 2-4 weeks

Contact us to get an engineer consultation and a preliminary project estimate within 2 days. We will calculate the cost based on your specific needs. Order an audit to find out the exact architecture and budget.