Introduction

Currently Cloud Stack has only one active master DB setup. Going forward it should support multiple Master/Slave cluster setup to enable High Availability of the Database.

Scope

To Achieve Active/Active cluster set up (Multiple masters and slaves).

Options Available

  1. MariaDB
  2. Percona Xtra DB (uses Galera cluster)
  3. MHA provided by SkySql
  4. Mysql Cluster
  5. Mysql Active/Active setup with connector specific configuration to read from slaves in case of master goes down

Results

MariaDB

  • It does not support any DB HA out of the box, we can use Galera with it but the same can be used with Mysql directly

Percona Xtra DB

MHA

  • It is very hard to install this and also need to write our own scripts to switch between master and slave either using VIP concept or some other way of LB
  • The switch over time is good enough to make Clolud Stack Server self fencing and needs to restart the management server

Mysql Cluster

It has the following limitations

  • The DB Engine itself changed to NDB Engine instead of InnoDB Engine
  • The up gradation from Mysql to Mysql cluster is not gauranteeing that without changes to schema and Sql queries those will work directly
  • Distributed locking is not supported,which can cause split brain issues of we use cluster with multiple mysql nodes.

Mysql Active/Active Setup

We can use Mysql's 2-way replication (Master-Master replication setup) along with connector's configuration to read/write data from one of the slave if master goes down.

  • It worked well with 2 nodes(master - master) set up for both fail over and fall back
  • Need to verify the same for more than 2 nodes set up using chain replication with first 2 nodes as cyclic pointers for master/slaves and remaining are chain replication strategy starting from 2nd node.
  • Also need to solve Communication link failure exception that is coming when master is going down.
  • Admin guide steps to do the chain way of replication for multi node setup
  • The connection will be reset if auto commit is true other wise not

Conclusion

  • We are good to go with this approach.

Changes in Code Base (wrt mysql's active/active setup)

  • Need to add extra properties related to connector parameters in db.properties
  • Code changes are needed in com.cloud.utils.db.Transaction class to add the new properties to the url construction of cloud databases.

Proposed Solution

  • Added the following properties to db.properties

#High Availability And Cluster Properties
db.ha.enabled=false (change it to true to enable db ha)
db.ha.loadBalanceStrategy=com.cloud.utils.db.StaticStrategy

#cloud stack Database
db.cloud.slaves=localhost,localhost (Comma separated list of slave hosts)
db.cloud.autoReconnect=true
db.cloud.failOverReadOnly=false
db.cloud.reconnectAtTxEnd=true
db.cloud.autoReconnectForPools=true
db.cloud.secondsBeforeRetryMaster=3600
db.cloud.queriesBeforeRetryMaster=5000
db.cloud.initialTimeout=3600

#usage Database
db.usage.slaves=localhost,localhost (Comma separated list of slave hosts)
db.usage.autoReconnect=true
db.usage.failOverReadOnly=false
db.usage.reconnectAtTxEnd=true
db.usage.autoReconnectForPools=true
db.usage.secondsBeforeRetryMaster=3600
db.usage.queriesBeforeRetryMaster=5000
db.usage.initialTimeout=3600

  • We will rely on Mysql's Connector properties to achieve Master-Master DB HA
  • We need to have at least 2 nodes which are connected through 2-way replication and should be consecutive in connector's url
  • It is recommended to use semi-replicaiton over asynchronous replication to avoid data inconsistency issues during master server crash(Asynchronous replication will not guarantee the date sync between master and slave before master crashes where as semi-synchronous will guarantee 99% of data sync between master and slave).
  • The following configuration need to be done in my.cnf of each mysql server to clean up bin log files automatically.

    expire_logs_days=10 (Number of days to keep the bin log files)

    max_binlog_size=100M (The max size of each bin log file)

  • To change the bin log files location please use the following property in my.cnf of each mysql server
    log-bin=/var/lib/mysql/binlog/bin-log  (Remember here “bin-log” is the file prefix)
  • To Over come Split Brain situation in case of multiple Management servers are exists, we need to modify the following 2 properties in db.properties.

    auto_increment_increment = 10
    Tells to the mysql node that auto increment values to be incremented by 10 instead of default value 1

    For Example, In a table X the last auto incremented value is 10 and the same will be replicated to Slave so offset on both the mysql nodes is 10 and now split brain occurred and MS1 is talking to DB1 and MS2 is talking to DB2 and both get the request to add entry to the table X. Now in DB1 the auto increment value will return 11 and in DB2 the auto increment value will return 20 and both will get synced with each other and new offset will set to 20 on both DB1 and DB2
    .
    auto_increment_offset = 2
    Tells to the mysql node that the starting point of the auto increment column value to be start with.
    The second property is relevant only when split brain occurs on fresh setup.

my.cnf Configuration Details (Asynchronous)

The mysql configuration to support DB HA in mysql server is goes into /etc/my.cnf file and varies a little bit between master and slave.

  1. Master :
    [mysqld]
    datadir=/var/lib/mysql
    socket=/var/lib/mysql/mysql.sock
    # Disabling symbolic-links is recommended to prevent assorted security risks
    symbolic-links=0
    # Settings user and group are ignored when systemd is used.
    # If you need to run mysqld under a different user or group,
    # customize your systemd unit file for mysqld according to the
    # instructions in http://fedoraproject.org/wiki/Systemd
    server-id=1
    default-storage-engine=InnoDB
    character-set-server=utf8
    transaction-isolation=READ-COMMITTED
    log-bin=mysql-bin
    innodb_flush_log_at_trx_commit=1
    sync_binlog=1
    binlog-format=ROW
    #Bin logs cleanup configuration
    expire_logs_days=10

    max_binlog_size=100M


    [mysqld_safe]
    log-error=/var/log/mysqld.log
    pid-file=/var/run/mysqld/mysqld.pid
  2. Slave
    [mysqld]
    datadir=/var/lib/mysql
    socket=/var/lib/mysql/mysql.sock
    # Disabling symbolic-links is recommended to prevent assorted security risks
    symbolic-links=0
    # Settings user and group are ignored when systemd is used.
    # If you need to run mysqld under a different user or group,
    # customize your systemd unit file for mysqld according to the
    # instructions in http://fedoraproject.org/wiki/Systemd
    server-id=2
    default-storage-engine = InnoDB
    character-set-server = utf8
    transaction-isolation = READ-COMMITTED
    log-bin=mysql-bin
    innodb_flush_log_at_trx_commit=1
    sync_binlog=1
    binlog-format=ROW
    #Parameters to solve split brain problem
    auto_increment_increment=10
    auto_increment_offset=2

    #Bin logs cleanup configuration
    expire_logs_days=10

    max_binlog_size=100M
    [mysqld_safe]
    log-error=/var/log/mysqld.log
    pid-file=/var/run/mysqld/mysqld.pid

Limitations/Things DBHA is not supported

  • Currently Monitoring of Slave by admin is not possible
  • Monitoring events are not integrated with MS

FAQs

Q: How fast the fail over will happen if master goes down?

A: It is actually the piing time out which controls this factor. The mysql connector property that controls this is "loadBalancePingTimeout".
     Currently this property is not exposed through db.properties so it uses the default pint time out value which is 0(Mean no ping time out)
     (http://dev.mysql.com/doc/connector-j/en/connector-j-reference-configuration-properties.html

Q:  What Mysql DB Versions are supported when DB HA is enabled?

A: The RHEL/CentOS versions supported by the Current ACS versions will get mysql versions of 5.1.X. So by defualt DB HA is also supported for these mysql versions only.

Q: Asynchronous replication V/S Semi-Synchronous Replication which one to use?

A: As Semi-Synchronous replication is supported by 5.5+ on wards so currently we support only Asynchronous replication only.

Q: How much latency is expected in syncing up when DB-HA is enabled?

A: The Synchronization between 2 master nodes is immediate and latency is depending on the actual network latency between 2 master nodes. There is no parameter to control this. The only thing we can get from master server nodes is how many seconds it is behind the other master server.
The following parameter gives us the above value when we so "Show slave status" on any master mysql prompt : "Seconds_Behind_Master"
 (http://dev.mysql.com/doc/refman/5.7/en/show-slave-status.html)

Q: What Network topologies(Same Zone, Multiple Zones) are suggested?

A: .As mentioned in previous question, as long as we make sure the network latency between 2 master nodes is as much less as possible we can use any topology. There are no recommended practices on this in mysql documentation

Appendix

  • What if I want to do a Manual Switch over from Master-1 to Master-2?
    • Put a 2 way replication as above and do not enable the db ha in db.properties. I.e put the value false against the following property in db.properties
      db.ha.enabled=false
  • We can achieve multiple nodes set up(2 master max and remaining would be slaves) with chain way of replication configuration


another way is circular chaining where all can act as master

  • This is tested against mysql version 5.5.x
  • Also tested against 5.1.x
  • No labels