My cluster I've been working on just starting acting up out of no where… It looks like I'm having an issue with the exportfs resource.
Any ideas on how to troubleshoot this? I can find nothing for a "-2" return code
============
Last updated: Mon Jan 7 09:18:18 2013
Last change: Fri Jan 4 16:02:13 2013 via crmd on emserver1
Stack: openais
Current DC: emserver1 - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
2 Nodes configured, 2 expected votes
9 Resources configured.
============
Online: [ emserver1 emserver2 ]
Master/Slave Set: ms_drbd_nfs [p_drbd_nfs]
Masters: [ emserver1 ]
Slaves: [ emserver2 ]
Clone Set: cl_lsb_nfsserver [p_lsb_nfsserver]
Started: [ emserver1 emserver2 ]
Resource Group: g_nfs
p_fs_nfs (ocf::heartbeat:Filesystem): Started emserver1
p_exportfs_nfs (ocf::heartbeat:exportfs): Started emserver1 (unmanaged) FAILED
p_ip_nfs (ocf::heartbeat:IPaddr2): Stopped
Clone Set: cl_exportfs_root [p_exportfs_root]
Started: [ emserver1 ]
Stopped: [ p_exportfs_root:1 ]
Failed actions:
p_drbd_nfs:1_promote_0 (node=emserver2, call=22, rc=-2, status=Timed Out): unknown exec error
p_exportfs_root:1_start_0 (node=emserver2, call=10, rc=-2, status=Timed Out): unknown exec error
p_exportfs_nfs_stop_0 (node=emserver1, call=32, rc=-2, status=Timed Out): unknown exec error
p_drbd_nfs:0_demote_0 (node=emserver1, call=19, rc=1, status=complete): unknown error
Best Answer
The ubuntu server package had outdated resource agents. There was a bug in the exportfs resource agent that caused the nfs rmtab to grow to an immense size (which is why the time outs were occurring).
I upgraded the resource agents from github and removed the 2GB rmtab. Everything was fine after that.