I have a process – a perl script – that does:
while true
check a POP account on a server on the lan
process any email found
write logs - messages found, actions taken, errors
sleep for 15 seconds
It's running on a redhat 7.3 server (I inherited it, I'm not happy about the age of that box). It's run out of /etc/inittab like:
spop:2345:respawn:/usr/local/gw/bin/popdmn
If it dies, init restarts it.
In the last couple of days, the process will no longer work unless it's straced. When it's just running, it never logs into the pop server. As soon as it's straced (via "strace -Ff -p cat /usr/local/gw/var/popdmn.pid
"), it works flawlessly.
As a workaround, I'm running screen on the server with an strace running. Obviously this is less than ideal.
Why would a process do this? I haven't seen this happen before.
Best Answer
I think I've been bitten by an ancient strace bug:
https://bugzilla.redhat.com/show_bug.cgi?id=64303
https://bugzilla.redhat.com/show_bug.cgi?id=75709
This box has strace-4.4-4 on it, so it sounds possible that it's that bug. It sounds like this one is self-inflicted, as we were stracing while trying to debug - and made it worse.
kill -CONT
works to resume the process.Definitely time to upgrade this box.