How to work around “entered FATAL state, too many start retries too quickly” in supervisor

supervisord

I'm just testing my supervisor with the simple program configuration:

[program:test]
command=python -c "print 'hello'"
autostart=true                
autorestart=true
exitcodes=1
user=ratdon
stdout_logfile=/opt/log/test.log
stderr_logfile=/opt/log/test.log

Starting my supervisord as sudo supervisord -n -c /opt/supervisord.conf &. But after few spawning, it stops spawning it again.

2016-02-01 11:17:58,973 CRIT Supervisor running as root (no user in config file)
2016-02-01 11:17:58,973 WARN Included extra file "/opt/test.ini" during parsing
2016-02-01 11:17:58,994 INFO RPC interface 'supervisor' initialized
2016-02-01 11:17:58,994 CRIT Server 'inet_http_server' running without any HTTP authentication checking
2016-02-01 11:17:58,995 INFO supervisord started with pid 19644
2016-02-01 11:17:59,998 INFO spawned: 'test' with pid 19648
2016-02-01 11:18:00,026 INFO exited: test (exit status 0; not expected)
2016-02-01 11:18:01,030 INFO spawned: 'test' with pid 19650
2016-02-01 11:18:01,064 INFO exited: test (exit status 0; not expected)
2016-02-01 11:18:03,072 INFO spawned: 'test' with pid 19653
2016-02-01 11:18:03,104 INFO exited: test (exit status 0; not expected)
2016-02-01 11:18:06,108 INFO spawned: 'test' with pid 19657
2016-02-01 11:18:06,138 INFO exited: test (exit status 0; not expected)
2016-02-01 11:18:07,139 INFO gave up: test entered FATAL state, too many start retries too quickly

I want supervisor to keep on restarting the program until I stop supervisord.

Is it possible? If yes how to do it?

Is there any option to make supervisor log the stdout with timestamp or we need to put the timestamp in stdout itself?

Best Answer

I encountered the same use case while working on a Docker micro services environment. In my case there was the possibility that Nginx started before its dynamically generated configuration was in place.

At present there is no way of letting Supervisord restart the service infinitely until the process has started successfully.

However there is a feasible workaround by using the startretries option. With the startretries option, Supervisord will restart the given number of times or until the process has started successfully.

In my particular use case the time frame for the race condition was less than a second so setting startretries=2 was sufficient. However you can set it to a much higher value if need be.

[program:test]
startretries=10
Related Topic