Linux – Pacemaker failure-timeout don’t reset failcount

corosyncfailoverhigh-availabilitylinuxpacemaker

I'm using Pacemaker 1.1.13 and Corosync 2.3.4 on Centos7.

I've a problem with Master/Slave resource. There is meta attrs for my resource:

migration-threshold=1

failure-timeout=10s

but when the resource goes down, there is only one attempt to start it. Documentation says that attribute failure-timeout=10s should reset failcount every 10 seconds, but that does not happen, so resource never start.

Do You know anything about this problem? Maybe I'm doing something wrong? I'm sending my 'pcs status' below:

Cluster Name: webcluster
Corosync Nodes:
 10.121.100.101 10.121.100.102
Pacemaker Nodes:
 pm-node1 pm-node2

Resources:
 Master: Services-master
  Meta Attrs: failure-timeout=10s
  Group: Services
   Meta Attrs: migration-threshold=1
   Resource: Test (class=ocf provider=scooty type=test)
    Operations: start interval=0s timeout=20 (Test-start-interval-0s)
                stop interval=0s timeout=20 (Test-stop-interval-0s)
                monitor interval=10 role=Master timeout=20 (Test-monitor-interval-10)
                monitor interval=11 role=Slave timeout=20 (Test-monitor-interval-11)

Stonith Devices:
Fencing Levels:

Location Constraints:
Ordering Constraints:
Colocation Constraints:

Resources Defaults:
 migration-threshold: 1
 failure-timeout: 10
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: webcluster
 dc-version: 1.1.13-10.el7_2.4-44eb2dd
 have-watchdog: false
 last-lrm-refresh: 1475145002
 no-quorum-policy: ignore
 start-failure-is-fatal: false
 stonith-enabled: false

Best Answer

Depending on the type of failure, failure-timeout might not be enough to clean it up. Start and Stop operation failures are considered "fatal" and will not be automatically cleaned up by failure-timeout.

If you're having issues with a start operation failing, you can set the cluster property start-failure-is-fatal=false. Fencing/STONITH devices are the only way to recover from a stop failure.

Hope that helps.

Related Solutions

MySQL: Pacemaker cannot start the failed master as a new slave

Eureka!

Both of us forgot a very very important log file, it's... /var/log/mysqld.log:

socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL) by Atomicorp
[Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000082' at position 58569, relay log './mysqld-relay-bin.000002' position: 58715
[Note] Slave I/O thread: connected to master 'repl@192.168.6.38:3306',replication started in log 'mysql-bin.000082' at position 58569
[Warning] Aborted connection 10 to db: 'unconnected' user: 'test_user' host: 'localhost' (init_connect command failed)
[Warning] The MySQL server is running with the --read-only option so it cannot execute this statement
[Note] /usr/libexec/mysqld: Normal shutdown

As you can guess, I tracked the user activity by combining the binlog and init-connect:

init_connect = "INSERT INTO audit.accesslog (connect_time, user_host, connection_id) VALUES (NOW(), CURRENT_USER(), CONNECTION_ID());"

but serving-6192 is set read-only when starting as a slave, and then when Pacemaker perform monitor operation with test_user:

    # Check for test table
    ocf_run -q $MYSQL $MYSQL_OPTIONS_TEST \
        -e "SELECT COUNT(*) FROM $OCF_RESKEY_test_table"

init_connect command failed with the above error:

The MySQL server is running with the --read-only option so it cannot execute this statement

The solution is I should set the init_connect option to the empty string before initializing the monitor action (don't forget to turn it back when promoting a node to become a master)

To anyone who are using event scheduler: also note that you must turn it on when promoting a slave to become a master:

set_event_scheduler() {
    local es_val
    if ocf_is_true $1; then
        es_val="on"
    else
        es_val="off"
    fi
    ocf_run $MYSQL $MYSQL_OPTIONS_REPL \
        -e "SET GLOBAL event_scheduler=${es_val}"
}

get_event_scheduler() {
    # Check if event-scheduler is set
    local event_scheduler_state

    event_scheduler_state=`$MYSQL $MYSQL_OPTIONS_REPL \
        -e "SHOW VARIABLES" | grep event_scheduler | awk '{print $2}'`

    if [ "$event_scheduler_state" = "ON" ]; then
        return 0
    else
        return 1
    fi
}

mysql_promote() {
    local master_info

    if ( ! mysql_status err ); then
        return $OCF_NOT_RUNNING
    fi
    ocf_run $MYSQL $MYSQL_OPTIONS_REPL \
        -e "STOP SLAVE"

    # Set Master Info in CIB, cluster level attribute
    update_data_master_status
    master_info="$(get_local_ip)|$(get_master_status File)|$(get_master_status Position)"
    ${CRM_ATTR_REPL_INFO} -v "$master_info"
    rm -f $tmpfile

    set_read_only off || return $OCF_ERR_GENERIC
    set_event_scheduler on || return $OCF_ERR_GENERIC

Also don't forget to turn it off when demoting:

    'pre-demote')
        # Is the notification for our set
        notify_resource=`echo $OCF_RESKEY_CRM_meta_notify_demote_resource|cut -d: -f1`
        my_resource=`echo $OCF_RESOURCE_INSTANCE|cut -d: -f1`
        if [ $notify_resource != ${my_resource} ]; then
            ocf_log debug "Notification is not for us"
            return $OCF_SUCCESS
        fi

        demote_host=`echo $OCF_RESKEY_CRM_meta_notify_demote_uname|tr -d " "`
        if [ $demote_host = ${HOSTNAME} ]; then
            ocf_log info "post-demote notification for $demote_host"
            set_read_only on
            set_event_scheduler off

Cheers,

How to make active/passive jboss resource in pacemaker

I have do similar configuration to make sure the virtual IP to live along with Mysql master server. For your case, I think the steps should be:

Add a primitive for two JBOSS intances (as your shareIP or MySQL servers)
Add colocation configure for JBOSS primitive to live along with MySQL Master as example below: colocation mysql_co_jboss inf: jboss ms_MySQL:Master

Best Answer

Related Solutions

MySQL: Pacemaker cannot start the failed master as a new slave

How to make active/passive jboss resource in pacemaker

Related Topic