Mysql – Need help configuring pacemaker

clusterconfigurationMySQLpacemaker

I'm trying to configure a 2-node openais / pacemaker cluster, and finding information / tutorials very difficult to come by.

My goal is to set up 2 MySQL servers in a master-slave replication scenerio, where I can read from both servers, but only write to the "master" server.

As such, I have the replication and manual failover working by swapping secondary IP addresses, and reconfiguring the new slave with the "read-only" option in my.cnf. I would now like to automate this with pacemaker. Basically I want the master IP to always be on the server that I can write to, and the slave IP to always be on the other one. I would like failover to occur when mysql on the master fails to respond.

Here is the configuration I have so far. Is there somebody that could check my work (which I'm sure is incomplete), and give me some pointers about what I'm doing wrong? (note: the mysql-master script is a custom script under /etc/init.d that sets/clears the read-only option, and restarts mysql)

primitive virtual_master_ip ocf:heartbeat:IPaddr2 \
    params ip=192.168.250.xxx.xxx \
    op monitor interval=10s
primitive virtual_slave_ip ocf:heartbeat:IPaddr2 \
    params ip=192.168.xxx.yyy
primitive mysql_service ocf:heartbeat:mysql
    params binary="/usr/bin/mysqld_safe" config="/etc/mysql/my.cnf" \
    op minitor interval="60s" timeout="30s" \
    meta target-role="Started"
primitive mysql_master_role lsb:mysql-master

group mysql virtual_master_ip mysql_master virtual_slave_ip
colocation ms_ip inf: virtual_master_ip virtual_slave_ip score="-INFINITY"

Best Answer

Knowing what the problem actually is would, as Kamil says, be awfully useful information. However, I've got a few problems for you straight off the bat:

Your IP address has five octets: 192.168.250.xxx.xxx
I'm not aware of an operation called 'minitor': op minitor interval="60s" timeout="30s"
I use the lsb:mysql class, rather than ocf:heartbeat:mysql; if you're having problems with that part of things, it might be worth giving it a go.
You probably don't want virtual_master_ip and virtual_slave_ip in the same group if you've got a -INF colocation constraint. That'll give pacemaker an arrythmia.

I'm not even sure that you want to be doing what you think you're doing, though. I'd be more inclined to setup two resources, mysql_as_master and mysql_as_slave, which makes sure that MySQL is running in the appropriate mode on the machine (I'd be starting MySQL with the read-only option set on the command line, rather than jiggering with the config file, and using the mysql client to query the running server to ensure that it's running read-only or read-write, as required), and then grouping them with the associated IP address.

Related Solutions

Linux – Pacemaker complex resource colocation

First of all, let me say that I've set up quite a few clusters over the last decade, and I've never seen one where there was a dependency like you've described. Usually one would set it up so that the services provided don't depend on which host is active and which is standby and you don't care which host has the resource, as long as it's up on one of them.

The only way I can come up with to implement what you want is to implement the slave node as a resource that is initiated by the master node, for example by SSHing over to the slave node to run the IPaddr2 and other resources you need. Likely using SSH public key authentication with an identity file and authorized_keys entry so that the master can run the commands on the slave without requiring a password.

So this would require creating a "slaveIPaddr2" resource script, that would just wrap a command like:

HOST=`hostname`
exec ssh -i /path/to/ssh-identity dbslave$${HOST#db} /path/to/IPaddr2 "$@"

Then change the ip_dbslave resource to "slaveIPaddr2" instead of "IPaddr2" as the resource to run.

As far as scripts to run before and after migration, these mostly just sound like they would be the normal multiple resource scripts that make up a resource group and precedence using the "group" and "order" configuration items. For example, creating "master_pre" (the "before" script you want to run on the master), "slave_pre", "master_post", etc... resources, then using "order" to specify that they run in the appropriate order (master_pre, slave_pre, ip_dbmaster, ip_dbslave, master_post, slave_post). Here you'll also likely need to wrap the slave items with the SSH wrapper, to effectively treat them as a single host as I mentioned above.

It sounds like you want the "pre" script to be run before a migration is even attempted, rather than as a part of starting the resource? Pacemaker isn't going to migrate a service unless it's told to by you, or the node currently running the service is failing. In the case of a failing node, your service is down anyway, so no reason to check to try to avoid the migration. So if you are concerned with preventing the migration when you tell it to migrate, the best answer there may be to make a "migrate" script that runs your pre-service checks, and only goes on with the migration request if the tests succeed.

I don't know of a way in pacemaker to test the other hosts in the cluster before doing a migration, if that is what you are trying to achieve with #4, so it'll likely have to be an external check that enforces that.

Running other resources than just the IPaddr2 is easily done via the "group" and "order" directives.

MySQL: Pacemaker cannot start the failed master as a new slave

Eureka!

Both of us forgot a very very important log file, it's... /var/log/mysqld.log:

socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL) by Atomicorp
[Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000082' at position 58569, relay log './mysqld-relay-bin.000002' position: 58715
[Note] Slave I/O thread: connected to master 'repl@192.168.6.38:3306',replication started in log 'mysql-bin.000082' at position 58569
[Warning] Aborted connection 10 to db: 'unconnected' user: 'test_user' host: 'localhost' (init_connect command failed)
[Warning] The MySQL server is running with the --read-only option so it cannot execute this statement
[Note] /usr/libexec/mysqld: Normal shutdown

As you can guess, I tracked the user activity by combining the binlog and init-connect:

init_connect = "INSERT INTO audit.accesslog (connect_time, user_host, connection_id) VALUES (NOW(), CURRENT_USER(), CONNECTION_ID());"

but serving-6192 is set read-only when starting as a slave, and then when Pacemaker perform monitor operation with test_user:

    # Check for test table
    ocf_run -q $MYSQL $MYSQL_OPTIONS_TEST \
        -e "SELECT COUNT(*) FROM $OCF_RESKEY_test_table"

init_connect command failed with the above error:

The MySQL server is running with the --read-only option so it cannot execute this statement

The solution is I should set the init_connect option to the empty string before initializing the monitor action (don't forget to turn it back when promoting a node to become a master)

To anyone who are using event scheduler: also note that you must turn it on when promoting a slave to become a master:

set_event_scheduler() {
    local es_val
    if ocf_is_true $1; then
        es_val="on"
    else
        es_val="off"
    fi
    ocf_run $MYSQL $MYSQL_OPTIONS_REPL \
        -e "SET GLOBAL event_scheduler=${es_val}"
}

get_event_scheduler() {
    # Check if event-scheduler is set
    local event_scheduler_state

    event_scheduler_state=`$MYSQL $MYSQL_OPTIONS_REPL \
        -e "SHOW VARIABLES" | grep event_scheduler | awk '{print $2}'`

    if [ "$event_scheduler_state" = "ON" ]; then
        return 0
    else
        return 1
    fi
}

mysql_promote() {
    local master_info

    if ( ! mysql_status err ); then
        return $OCF_NOT_RUNNING
    fi
    ocf_run $MYSQL $MYSQL_OPTIONS_REPL \
        -e "STOP SLAVE"

    # Set Master Info in CIB, cluster level attribute
    update_data_master_status
    master_info="$(get_local_ip)|$(get_master_status File)|$(get_master_status Position)"
    ${CRM_ATTR_REPL_INFO} -v "$master_info"
    rm -f $tmpfile

    set_read_only off || return $OCF_ERR_GENERIC
    set_event_scheduler on || return $OCF_ERR_GENERIC

Also don't forget to turn it off when demoting:

    'pre-demote')
        # Is the notification for our set
        notify_resource=`echo $OCF_RESKEY_CRM_meta_notify_demote_resource|cut -d: -f1`
        my_resource=`echo $OCF_RESOURCE_INSTANCE|cut -d: -f1`
        if [ $notify_resource != ${my_resource} ]; then
            ocf_log debug "Notification is not for us"
            return $OCF_SUCCESS
        fi

        demote_host=`echo $OCF_RESKEY_CRM_meta_notify_demote_uname|tr -d " "`
        if [ $demote_host = ${HOSTNAME} ]; then
            ocf_log info "post-demote notification for $demote_host"
            set_read_only on
            set_event_scheduler off

Cheers,

Best Answer

Related Solutions

Linux – Pacemaker complex resource colocation

MySQL: Pacemaker cannot start the failed master as a new slave

Related Topic