Ubuntu – pacemaker not able to start MySQL on second node

corosyncMySQLpacemakerUbuntu

I have a pacemaker/corosync/drbd setup on 2 physically idential Ubuntu Server 16.04 LTS and I am trying to achieve high availability for MySQL 5.7 and Apache 2.4.

Both servers where set up the exact same way and have the exact same packages installed. The only differences are hostnames, IP addresses and master/slave configuration in pacemaker/corosync/drbd.

My problem is that pacemaker is able to start the MySQL Server and every other service on node 1 but when I simulate a crash of node 1, pacemaker is not able to start the MySQL service on node 2.

This is the output of crm_mon (both nodes online):

Last updated: Wed Jan 10 18:57:02 2018          Last change: Wed Jan 10 18:00:19
2018 by root via crm_attribute on Server1
Stack: corosync
Current DC: Server1 (version 1.1.14-70404b0) - partition with quorum
2 nodes and 7 resources configured

Online: [ Server1 Server2 ]

 Master/Slave Set: ms_r0 [r0]
 Masters: [ Server1 ]
 Slaves: [ Server2 ]
Resource Group: WebServer
 ClusterIP  (ocf::heartbeat:IPaddr2):       Started Server1
 WebFS      (ocf::heartbeat:Filesystem):    Started Server1
 Links      (ocf::heartbeat:drbdlinks):     Started Server1
 DBase      (ocf::heartbeat:mysql): Started Server1
 WebSite    (ocf::heartbeat:apache):        Started Server1

But when I simulate the crash of node 1, I get:

Last updated: Wed Jan 10 19:05:25 2018          Last change: Wed Jan 10 19:05:17
2018 by root via crm_attribute on Server1
Stack: corosync
Current DC: Server1 (version 1.1.14-70404b0) - partition with quorum
2 nodes and 7 resources configured

Node Server1: standby
Online: [ Server2 ]

Master/Slave Set: ms_r0 [r0]
 Masters: [ Server2 ]
Resource Group: WebServer
 ClusterIP  (ocf::heartbeat:IPaddr2):       Started Server2
 WebFS      (ocf::heartbeat:Filesystem):    Started Server2
 Links      (ocf::heartbeat:drbdlinks):     Started Server2
 DBase      (ocf::heartbeat:mysql): Stopped
 WebSite    (ocf::heartbeat:apache):        Stopped

Failed Actions:
* DBase_start_0 on Server2 'unknown error' (1): call=45, status=complete
, exitreason='MySQL server failed to start (pid=3346) (rc=1), please check your
installation',
last-rc-change='Wed Jan 10 17:58:15 2018', queued=0ms, exec=2202ms

This was my inital Pacemaker configuration: https://pastebin.com/kEYjjgKw

After I recognized that there is a problem with the start of MySQL on node 2 I did some research and read that one shoudl pass some additional parameters to MySQL in the pacemaker configuration.
Thats why I changed the Pacemaker configuration to this: https://pastebin.com/J7Zk1kBA

Unfortunately this did not solve the problem.

From my understanding Pacemaker is using the same command on both machines to start the MySQL daemon. Thats why I find it kinda absurd that it is not able to start MySQL on the node 2 which was configured the exact same way.

drbd0 is getting mounted by pacemaker and drbdlinks is creating symbolic links for /var/www and /var/lib/mysql

I tested this funcionality and it seems to work. When node 1 is offline, drbd0 is mounted on node 2 and the symbolic links are created. /var/lib/mysql is pointing to drbd0 and all the files are in the directory.

If you have any ideas/advices on how to narrow the cause of this problem I would be really thankful if you could post them here.

If there is more information needed I am happy to provide it.

Thanks in advance!

Regards,
PAlbrecht

Best Answer

When I have had to work with pacemaker in the past, there are a few different procedures I use when troubleshooting this sort of thing. The general idea is to verify each dependency "layer" of the pacemaker config where the dependency graph is:

mysql -> mounting of filesystem -> DRBD master

Also Clusters from Scratch has a good walkthrough of a very similar config.

First thing is to make sure that DRBD is configured and synced up. On either node, run:

cat /proc/drbd

The output should show something like the following if DRBD is fully synced and ready for a failover (see p. 45 of CfS):

[root@pcmk-1 ~]# cat /proc/drbd
version: 8.4.6 (api:1/proto:86-101)
GIT-hash: 833d830e0152d1e457fa7856e71e11248ccf3f70 build by phil@Build64R7, 2015-04-10
 05:13:52
 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:1048508 nr:0 dw:0 dr:1049420 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

cat /proc/drbd

outputs something like (also on p. 45 of CfS)

[root@ovz-node1 ~]# cat /proc/drbd
version: 0.7.17 (api:77/proto:74)
SVN Revision: 2093 build by phil@mescal, 2006-03-06 15:04:12
 0: cs:SyncSource st:Primary/Secondary ld:Consistent
    ns:627252 nr:0 dw:0 dr:629812 al:0 bm:38 lo:640 pe:0 ua:640 ap:0
        [=>..................] sync'ed:  6.6% (8805/9418)M
        finish: 0:04:51 speed: 30,888 (27,268) K/sec

then the system isn't in a state where it can successfully failover. Wait for it to complete and then retry your failover test.

Assuming DRBD was synced before the simulated failure of node1, the next thing to try after failing over to node2 when the DB is not running on node2 is to login to node2 and check the following:

Does cat /proc/drbd show node2 as primary?
Does mount show /dev/drbd0 mounted at its configured mount point (from pastebin, this should be '/sync')?
Are all your expected symlinks setup?
Do you see the same files in /sync on node2 as were present on node1 prior to the failover?

and most importantly, if all these questions are answered in the affirmative:

Will MySQL start successfully when started manually on node2 (perhaps using /etc/init.d/mysql start or systemctl equivalent)?
If MySQL starts, does the mysql client show that the running server is actually serving up the DB data stored under /sync? Can databases and tables known to be working on node1 be accessed using the mysql client on node2?

If MySQL starts up manually, then there's likely something wrong in its pacemaker config.

In full disclosure: I haven't personally used the ocf::heartbeat:mysql resource; instead I've used the 'lsb' resource 'lsb:mysql'.

Related Solutions

MySQL: Pacemaker cannot start the failed master as a new slave

Eureka!

Both of us forgot a very very important log file, it's... /var/log/mysqld.log:

socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL) by Atomicorp
[Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000082' at position 58569, relay log './mysqld-relay-bin.000002' position: 58715
[Note] Slave I/O thread: connected to master 'repl@192.168.6.38:3306',replication started in log 'mysql-bin.000082' at position 58569
[Warning] Aborted connection 10 to db: 'unconnected' user: 'test_user' host: 'localhost' (init_connect command failed)
[Warning] The MySQL server is running with the --read-only option so it cannot execute this statement
[Note] /usr/libexec/mysqld: Normal shutdown

As you can guess, I tracked the user activity by combining the binlog and init-connect:

init_connect = "INSERT INTO audit.accesslog (connect_time, user_host, connection_id) VALUES (NOW(), CURRENT_USER(), CONNECTION_ID());"

but serving-6192 is set read-only when starting as a slave, and then when Pacemaker perform monitor operation with test_user:

    # Check for test table
    ocf_run -q $MYSQL $MYSQL_OPTIONS_TEST \
        -e "SELECT COUNT(*) FROM $OCF_RESKEY_test_table"

init_connect command failed with the above error:

The MySQL server is running with the --read-only option so it cannot execute this statement

The solution is I should set the init_connect option to the empty string before initializing the monitor action (don't forget to turn it back when promoting a node to become a master)

To anyone who are using event scheduler: also note that you must turn it on when promoting a slave to become a master:

set_event_scheduler() {
    local es_val
    if ocf_is_true $1; then
        es_val="on"
    else
        es_val="off"
    fi
    ocf_run $MYSQL $MYSQL_OPTIONS_REPL \
        -e "SET GLOBAL event_scheduler=${es_val}"
}

get_event_scheduler() {
    # Check if event-scheduler is set
    local event_scheduler_state

    event_scheduler_state=`$MYSQL $MYSQL_OPTIONS_REPL \
        -e "SHOW VARIABLES" | grep event_scheduler | awk '{print $2}'`

    if [ "$event_scheduler_state" = "ON" ]; then
        return 0
    else
        return 1
    fi
}

mysql_promote() {
    local master_info

    if ( ! mysql_status err ); then
        return $OCF_NOT_RUNNING
    fi
    ocf_run $MYSQL $MYSQL_OPTIONS_REPL \
        -e "STOP SLAVE"

    # Set Master Info in CIB, cluster level attribute
    update_data_master_status
    master_info="$(get_local_ip)|$(get_master_status File)|$(get_master_status Position)"
    ${CRM_ATTR_REPL_INFO} -v "$master_info"
    rm -f $tmpfile

    set_read_only off || return $OCF_ERR_GENERIC
    set_event_scheduler on || return $OCF_ERR_GENERIC

Also don't forget to turn it off when demoting:

    'pre-demote')
        # Is the notification for our set
        notify_resource=`echo $OCF_RESKEY_CRM_meta_notify_demote_resource|cut -d: -f1`
        my_resource=`echo $OCF_RESOURCE_INSTANCE|cut -d: -f1`
        if [ $notify_resource != ${my_resource} ]; then
            ocf_log debug "Notification is not for us"
            return $OCF_SUCCESS
        fi

        demote_host=`echo $OCF_RESKEY_CRM_meta_notify_demote_uname|tr -d " "`
        if [ $demote_host = ${HOSTNAME} ]; then
            ocf_log info "post-demote notification for $demote_host"
            set_read_only on
            set_event_scheduler off

Cheers,

How to make active/passive jboss resource in pacemaker

I have do similar configuration to make sure the virtual IP to live along with Mysql master server. For your case, I think the steps should be:

Add a primitive for two JBOSS intances (as your shareIP or MySQL servers)
Add colocation configure for JBOSS primitive to live along with MySQL Master as example below: colocation mysql_co_jboss inf: jboss ms_MySQL:Master

Best Answer

Related Solutions

MySQL: Pacemaker cannot start the failed master as a new slave

How to make active/passive jboss resource in pacemaker

Related Topic