How to Monitor Processes in Python on Linux

linuxpythonsystems

I have some previous posts talking about how to use python to "do something" when a record is inserted or deleted into a postgres database. I finally decided on going with a message queue to handle the "jobs"(beanstalkd). I have everything setup and running with another python process that watches the queue and "does stuff". I am not really a "systems" guy so I am not sure what is a good way to go about monitoring the process to make sure if it fails or dies that it restarts and sends a notification. Google gave some good ideas but I thought asking here I could get some suggestions from people that I am sure have had to do something similar.

The process is critical to the system and it just needs to always work and if its not working then it needs to be addressed and other parts of the system "paused" until the problem is fixed.

My thoughts were to just have a cronscript run every minute or two that checks to see if the process is running. If not it restarts it. Another script (or maybe just another function of the first) would be to monitor the jobs and if the jobs waiting to be processed hit a specific threshold to also flag that there is a major problem.

Specifics about process…
The process updates the orders in a legacy system with the qty's of items that are shipped or back ordered from our warehouse. SO if these things are not done then when the order is invoiced it will have incorrect qtys and the people involved wouldn't have a good way to spot this unless they are checking each line. I thought I might also have a flag on the order that says "yes i have been touched" and if its hasn't to just notify the invoice agent.

This same method is going to be used for updating orders with shipping information based on when orders are shipped from UPS Worldship.

I don't know, i think i have a handle on this but it just feels "kludgy".

Best Answer

You can wait() for beanstalkd's pid; if it exits (cleanly or otherwise), wait() will return you the exit code, and you will be able to restart the process imemdiately.

Beanstalkd persists its queue (if you specity -b), so beanstalkd process crashing time to time (if ever) probably is not an issue. But your postgress trigger will not be able to push the data to beanstalkd queue at that moment. For this reason, I'd use a separate queue table in postgres. Transactions append records to this table. A periodic (say, once a second) process checks this table, pushes the data to beanstalkd and only removes it from queue table if beanstalkd reliably accepted the data.

With this setup, the worst case you'll have is data not being sent entirely timely to the system that beanstalkd supplies it to. Other parts of system will not need to actually pause, because once everything is in palce again, the backlog of messages will be cleared eventually.

Related Solutions

Application Communication – Preferred Way of Communicating Between Applications on the Same System

To my knowledge, the only standard for specific technologies is "whatever works for your needs."

That said, you might want to look into Redis. I've used it before to communicate between disparate systems, and it works well.

Basically, what it is is an in-memory datastore. You set the service to watch for a specific key, and when your system does something with that watched key, the service reacts.

In a more general sense, what you're asking for is often known as a "pub-sub" (publisher-subscriber) setup. So if Redis doesn't work for you, you might be able to use the more general term to find a solution that does work. And really, if you wanted to, you could leverage Twitter to be the intermediary, if you really wanted to (the system would post a tweet, and the service would watch the twitter feed for updates). If I recall correctly, GitHub uses Hubot and Campfire in a similar manner (Hubot sits in a Campfire chat room, waiting for someone to send commands, to which Hubot responds).

Python – writing a controller file in Python

I have a similar system but have a different concept:

I use cron to auto-start processes
Check if related process is running
If it is not running, then start the process.

The module uses psutil package to get a list of running processes, search for the related process and returns whether is it in the list or not. That may sound practical or not according to your use case though:

import psutil

class ProcessControl(object):
    """
    This program checks whether the given python file is running or not. If check_params flag is set as True, then
    it will be checked if an instance of the python file is running with all of the given paramters. If check_params
    flag sets as False, then the check will be made with only using the python file name and instances of the file
    running with different parameters will also be count.
    """
    def __init__(self, filename, *args):
        self.filename = filename
        self.args = args
        self.arg_num = len(args)

    def process_count(self, check_params):
        process_count = 0
        # Examine the process list to check if given process is running or not...
        for _prs in psutil.process_iter():
            try:
                _cmdline = _prs.cmdline()
            except TypeError:
                _cmdline = _prs.cmdline
            if len(_cmdline) >= 2 and "python" in _cmdline[0] and self.filename in _cmdline[1] and _prs.is_running():
                # We found an instance of the process that is running. Since This control function is triggered when
                # we run the python code, There would be at least one process (this one) which is running. Counting 2
                # or more processes means there is another instance which is still running when we trigger the python
                # code file
                if check_params:
                    # We also will check the parameters for the complete similarity
                    if len(_cmdline) == 2 + self.arg_num and all(str(_arg) in _cmdline[2:] for _arg in self.args):
                        process_count += 1
                    else:
                        # This is no match...
                        pass
                else:
                    process_count += 1
            else:
                # This is no match....
                pass
        return process_count

    def is_running(self, check_params=True):
        return self.process_count(check_params) > 1

I have little python files which have similar code as below:

Ex:

myFile.py

class MyCodeClass:
    def run_code(self):
         ...


if __name__ == "__main__":
    my_code = MyCodeClass()
    try:
        if ProcessControl(__file__).is_running():
            print "Already Running"
        else:
            my_code.run_code()
    except Exception as e:
        print e

And finally I have lines that trigger this file in my cron:

* * * * * python myFile.py

Logic:

cron triggers this file every x minutes. ProcessControl checks whether file is running or not. When calling is_running method, you can pass a bool value so the controller will ignore the parameters passed while running the python file. Like if check_params is True following two commands will be accepted as different and both will be triggered:

python myOtherFile.py 127.0.0.1 220
python myOtherFile.py 127.0.0.1 250

and disabling check_params will evaluate the second call as the same program and will not trigger it

Best Answer

Related Solutions

Application Communication – Preferred Way of Communicating Between Applications on the Same System

Python – writing a controller file in Python

Related Topic