Automatic database failover setup

Our company is engaged in the development, support and maintenance of sites of any complexity. From simple one-page sites to large-scale cluster systems built on micro services. Experience of developers is confirmed by certificates from vendors.
Development and maintenance of all types of websites:
Informational websites or web applications
Business card websites, landing pages, corporate websites, online catalogs, quizzes, promo websites, blogs, news resources, informational portals, forums, aggregators
E-commerce websites or web applications
Online stores, B2B portals, marketplaces, online exchanges, cashback websites, exchanges, dropshipping platforms, product parsers
Business process management web applications
CRM systems, ERP systems, corporate portals, production management systems, information parsers
Electronic service websites or web applications
Classified ads platforms, online schools, online cinemas, website builders, portals for electronic services, video hosting platforms, thematic portals

These are just some of the technical types of websites we work with, and each of them can have its own specific features and functionality, as well as be customized to meet the specific needs and goals of the client.

Our competencies:
Development stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822
  • image_bitrix-bitrix-24-1c_fixper_448_0.png
    Website development for FIXPER company
    815

Setting Up Automatic Database Failover

Automatic failover — mechanism that detects database master unavailability and automatically promotes replica to master without manual intervention. Goal: reduce RTO (Recovery Time Objective) from tens of minutes to seconds.

Tools for PostgreSQL

Patroni — de facto standard

Patroni — Python daemon running on each PostgreSQL node. Uses DCS (Distributed Consensus Store: etcd, Consul, ZooKeeper) for coordination and leader election.

# /etc/patroni/patroni.yml (on each node)
scope: production-cluster
namespace: /service/
name: node1

restapi:
  listen: 0.0.0.0:8008
  connect_address: 192.168.1.10:8008

etcd3:
  hosts: 192.168.1.20:2379,192.168.1.21:2379,192.168.1.22:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 30
    maximum_lag_on_failover: 1048576  # 1MB max lag on failover
    synchronous_mode: false

  postgresql:
    listen: 0.0.0.0:5432
    connect_address: 192.168.1.10:5432
    data_dir: /var/lib/postgresql/14/main
    authentication:
      replication:
        username: replication
        password: replication_password
    parameters:
      max_connections: 200
      shared_buffers: 256MB
      wal_level: replica
      hot_standby: on

Start:

systemctl start patroni

Check cluster status:

patronictl -c /etc/patroni/patroni.yml list
# Shows leader and replicas with lag

Manual switchover (planned):

patronictl -c /etc/patroni/patroni.yml switchover production-cluster

etcd for coordination

# etcd cluster on three servers
etcd --name etcd1 \
  --data-dir /var/lib/etcd \
  --listen-peer-urls http://192.168.1.20:2380 \
  --listen-client-urls http://192.168.1.20:2379 \
  --initial-advertise-peer-urls http://192.168.1.20:2380 \
  --advertise-client-urls http://192.168.1.20:2379 \
  --initial-cluster "etcd1=http://192.168.1.20:2380,etcd2=http://192.168.1.21:2380,etcd3=http://192.168.1.22:2380"

HAProxy for transparent switching

Patroni provides healthcheck endpoint:

  • GET /master → 200 if node is leader
  • GET /replica → 200 if node is replica
# haproxy.cfg
frontend postgres_write
  bind *:5000
  default_backend postgres_master

backend postgres_master
  option httpchk GET /master
  http-check expect status 200
  server node1 192.168.1.10:5432 check port 8008
  server node2 192.168.1.11:5432 check port 8008
  server node3 192.168.1.12:5432 check port 8008

MySQL: MySQL InnoDB Cluster

MySQL Router + Group Replication — official Oracle solution:

# Initialize cluster (MySQL Shell)
mysqlsh [email protected]:3306
JS> dba.createCluster('myCluster')
JS> cluster = dba.getCluster()
JS> cluster.addInstance('[email protected]:3306')
JS> cluster.status()

Setup MySQL Router:

mysqlrouter --bootstrap [email protected]:3306 --directory /etc/mysqlrouter

Router automatically directs writes to primary and updates routing on failover.

Testing failover

# Kill current master
patronictl -c /etc/patroni/patroni.yml failover production-cluster --force

# Monitor switching
watch -n 1 patronictl -c /etc/patroni/patroni.yml list

Typical Patroni + etcd failover time: 10–30 seconds.

Timeline

Setup of Patroni cluster on 3 nodes with HAProxy and etcd — 3–4 business days.