Centos – Apache Failed to Start in Pacemaker

apache-2.4centoshigh-availabilitypacemaker

I am using Pacemaker with Corosync to set up a basic Apache HA cluster with 3 nodes running CentOS. For some reasons, I cannot get the apache resource started in pcs.

Cluster IP: 192.168.200.40

# pcs resource show ClusterIP
     Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
      Attributes: cidr_netmask=24 ip=192.168.200.40
      Operations: monitor interval=20s (ClusterIP-monitor-interval-20s)
                  start interval=0s timeout=20s (ClusterIP-start-interval-0s)
                  stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)



# pcs resource show WebServer
 Resource: WebServer (class=ocf provider=heartbeat type=apache)
  Attributes: configfile=/etc/httpd/conf/httpd.conf statusurl=http://localhost/server-status
  Operations: monitor interval=1min (WebServer-monitor-interval-1min)
              start interval=0s timeout=40s (WebServer-start-interval-0s)
              stop interval=0s timeout=60s (WebServer-stop-interval-0s)



# pcs status
Cluster name: 
WARNING: corosync and pacemaker node names do not match (IPs used in setup?)
Stack: corosync
Current DC: server3.example.com (version 1.1.18-11.el7_5.2-2b07d5c5a9) - partition with quorum
Last updated: Thu Jun  7 21:59:09 2018
Last change: Thu Jun  7 21:45:23 2018 by root via cibadmin on server1.example.com

3 nodes configured
2 resources configured

Online: [ server1.example.com server2.example.com server3.example.com ]

Full list of resources:

 ClusterIP  (ocf::heartbeat:IPaddr2):   Started server2.example.com
 WebServer  (ocf::heartbeat:apache):    Stopped

Failed Actions:
* WebServer_start_0 on server3.example.com 'unknown error' (1): call=49, status=Timed Out, exitreason='',
    last-rc-change='Thu Jun  7 21:46:03 2018', queued=0ms, exec=40002ms
* WebServer_start_0 on server1.example.com 'unknown error' (1): call=53, status=Timed Out, exitreason='',
    last-rc-change='Thu Jun  7 21:45:23 2018', queued=0ms, exec=40003ms
* WebServer_start_0 on server2.example.com 'unknown error' (1): call=47, status=Timed Out, exitreason='',
    last-rc-change='Thu Jun  7 21:46:43 2018', queued=1ms, exec=40002ms


Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

The httpd instance is enabled and running on all three nodes. The cluster IP and individual node IPs are able to access the web page. The ClusterIP resource also works well for failover. What may go wrong for the apache resource in this case?

Thank you very much!

Update:

Here is more information from the debug output. It seems the Apache is unable to bind to the port, but there is no error from the apache log, and systemctl status httpd gave all green on all nodes. I can open web pages via the cluster IP and node IPs. The ClusterIP resource failover works fine, too. Any idea on why Apache resource doesn't work with pacemaker?

# pcs resource debug-start WebServer --full
Operation start for WebServer (ocf:heartbeat:apache) failed: 'Timed Out' (2)
 >  stderr: ERROR: (98)Address already in use: AH00072: make_sock: could not bind to address [::]:80 (98)Address already in use: AH00072: make_sock: could not bind to address 0.0.0.0:80 no listening sockets available, shutting down AH00015: Unable to open logs
 >  stderr: INFO: apache not running
 >  stderr: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up
 >  stderr: INFO: apache not running
 >  stderr: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up
 >  stderr: INFO: apache not running
 >  stderr: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up
 >  stderr: INFO: apache not running

Best Answer

In CentOS8

doing this...

pcs resource create httpd_monitor ocf:heartbeat:apache \
configfile="/etc/httpd/conf/httpd.conf" \
statusurl="http://127.0.0.1/server-status" --group apache

The file /etc/httpd/conf/httpd.conf is checked for the PidFile parameter. This is not defined, but defaults to /var/run/httpd/httpd.pid

[root@hanode1 ~]# pcs resource
  * Resource Group: apache:
    * httpd_fs  (ocf::heartbeat:Filesystem):     Started hanode1.lab.local
    * httpd_vip (ocf::heartbeat:IPaddr2):        Started hanode1.lab.local
    * apache_service    (service:httpd):         Started hanode1.lab.local
    * httpd_monitor     (ocf::heartbeat:apache):         Stopped

You get this error message

Feb 02 17:39:21 INFO: apache not running
Feb 02 17:39:21 INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up

So if you define this in /etc/httpd/conf/httpd.conf

# this is the default but is required by pcs to be defined
PidFile /var/run/httpd/httpd.pid

This will run fine, as below:

[root@hanode1 ~]# pcs resource debug-start httpd_monitor
Operation start for httpd_monitor (ocf:heartbeat:apache) returned: 'ok' (0)
Feb 02 17:39:57 INFO: apache already running (pid 88022)

Then you can clean up with pcs resource cleanup httpd_monitor

# pcs resource
  * Resource Group: apache:
    * httpd_fs  (ocf::heartbeat:Filesystem):     Started hanode1.lab.local
    * httpd_vip (ocf::heartbeat:IPaddr2):        Started hanode1.lab.local
    * apache_service    (service:httpd):         Started hanode1.lab.local
    * httpd_monitor     (ocf::heartbeat:apache):         Started hanode1.lab.local

kudos to @cleverpig

Related Solutions

MySQL: Pacemaker cannot start the failed master as a new slave

Eureka!

Both of us forgot a very very important log file, it's... /var/log/mysqld.log:

socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server (GPL) by Atomicorp
[Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000082' at position 58569, relay log './mysqld-relay-bin.000002' position: 58715
[Note] Slave I/O thread: connected to master 'repl@192.168.6.38:3306',replication started in log 'mysql-bin.000082' at position 58569
[Warning] Aborted connection 10 to db: 'unconnected' user: 'test_user' host: 'localhost' (init_connect command failed)
[Warning] The MySQL server is running with the --read-only option so it cannot execute this statement
[Note] /usr/libexec/mysqld: Normal shutdown

As you can guess, I tracked the user activity by combining the binlog and init-connect:

init_connect = "INSERT INTO audit.accesslog (connect_time, user_host, connection_id) VALUES (NOW(), CURRENT_USER(), CONNECTION_ID());"

but serving-6192 is set read-only when starting as a slave, and then when Pacemaker perform monitor operation with test_user:

    # Check for test table
    ocf_run -q $MYSQL $MYSQL_OPTIONS_TEST \
        -e "SELECT COUNT(*) FROM $OCF_RESKEY_test_table"

init_connect command failed with the above error:

The MySQL server is running with the --read-only option so it cannot execute this statement

The solution is I should set the init_connect option to the empty string before initializing the monitor action (don't forget to turn it back when promoting a node to become a master)

To anyone who are using event scheduler: also note that you must turn it on when promoting a slave to become a master:

set_event_scheduler() {
    local es_val
    if ocf_is_true $1; then
        es_val="on"
    else
        es_val="off"
    fi
    ocf_run $MYSQL $MYSQL_OPTIONS_REPL \
        -e "SET GLOBAL event_scheduler=${es_val}"
}

get_event_scheduler() {
    # Check if event-scheduler is set
    local event_scheduler_state

    event_scheduler_state=`$MYSQL $MYSQL_OPTIONS_REPL \
        -e "SHOW VARIABLES" | grep event_scheduler | awk '{print $2}'`

    if [ "$event_scheduler_state" = "ON" ]; then
        return 0
    else
        return 1
    fi
}

mysql_promote() {
    local master_info

    if ( ! mysql_status err ); then
        return $OCF_NOT_RUNNING
    fi
    ocf_run $MYSQL $MYSQL_OPTIONS_REPL \
        -e "STOP SLAVE"

    # Set Master Info in CIB, cluster level attribute
    update_data_master_status
    master_info="$(get_local_ip)|$(get_master_status File)|$(get_master_status Position)"
    ${CRM_ATTR_REPL_INFO} -v "$master_info"
    rm -f $tmpfile

    set_read_only off || return $OCF_ERR_GENERIC
    set_event_scheduler on || return $OCF_ERR_GENERIC

Also don't forget to turn it off when demoting:

    'pre-demote')
        # Is the notification for our set
        notify_resource=`echo $OCF_RESKEY_CRM_meta_notify_demote_resource|cut -d: -f1`
        my_resource=`echo $OCF_RESOURCE_INSTANCE|cut -d: -f1`
        if [ $notify_resource != ${my_resource} ]; then
            ocf_log debug "Notification is not for us"
            return $OCF_SUCCESS
        fi

        demote_host=`echo $OCF_RESKEY_CRM_meta_notify_demote_uname|tr -d " "`
        if [ $demote_host = ${HOSTNAME} ]; then
            ocf_log info "post-demote notification for $demote_host"
            set_read_only on
            set_event_scheduler off

Cheers,

Pacemaker config resource for nginx

1: you need to be sure, the resource agent is there

/usr/lib/ocf/resource.d/heartbeat/nginx

2: I don't seen nginx in your previous output

3: I'm using Suse 11 Sp2 and I have the nginx installed, without using extra package

node01:~ # rpm -qf /usr/lib/ocf/resource.d/heartbeat/nginx
resource-agents-3.9.2-0.25.5

I know Redhat has removed many resource agents, for more information, you can use the clusterlabs mailing list archive

Best Answer

Related Solutions

MySQL: Pacemaker cannot start the failed master as a new slave

Pacemaker config resource for nginx

Related Topic