How to Monitor Processes in Python on Linux

linuxpythonsystems

I have some previous posts talking about how to use python to "do something" when a record is inserted or deleted into a postgres database. I finally decided on going with a message queue to handle the "jobs"(beanstalkd). I have everything setup and running with another python process that watches the queue and "does stuff". I am not really a "systems" guy so I am not sure what is a good way to go about monitoring the process to make sure if it fails or dies that it restarts and sends a notification. Google gave some good ideas but I thought asking here I could get some suggestions from people that I am sure have had to do something similar.

The process is critical to the system and it just needs to always work and if its not working then it needs to be addressed and other parts of the system "paused" until the problem is fixed.

My thoughts were to just have a cronscript run every minute or two that checks to see if the process is running. If not it restarts it. Another script (or maybe just another function of the first) would be to monitor the jobs and if the jobs waiting to be processed hit a specific threshold to also flag that there is a major problem.

Specifics about process…
The process updates the orders in a legacy system with the qty's of items that are shipped or back ordered from our warehouse. SO if these things are not done then when the order is invoiced it will have incorrect qtys and the people involved wouldn't have a good way to spot this unless they are checking each line. I thought I might also have a flag on the order that says "yes i have been touched" and if its hasn't to just notify the invoice agent.

This same method is going to be used for updating orders with shipping information based on when orders are shipped from UPS Worldship.

I don't know, i think i have a handle on this but it just feels "kludgy".

Best Answer

You can wait() for beanstalkd's pid; if it exits (cleanly or otherwise), wait() will return you the exit code, and you will be able to restart the process imemdiately.

Beanstalkd persists its queue (if you specity -b), so beanstalkd process crashing time to time (if ever) probably is not an issue. But your postgress trigger will not be able to push the data to beanstalkd queue at that moment. For this reason, I'd use a separate queue table in postgres. Transactions append records to this table. A periodic (say, once a second) process checks this table, pushes the data to beanstalkd and only removes it from queue table if beanstalkd reliably accepted the data.

With this setup, the worst case you'll have is data not being sent entirely timely to the system that beanstalkd supplies it to. Other parts of system will not need to actually pause, because once everything is in palce again, the backlog of messages will be cleared eventually.

Related Topic