Nginx thousands of server config files very slow to reload (nginx -s reload)

nginxperformance

I have one master nginx.conf where I include the rest of my servers (sever blocks) with the include directive:

include myservers/*.conf;

My problem is when I have a new configuration file in myservers/ I need to reload the nginx with nginx -s reload

The problem takes long time to reload the server takes 1 minute and this is going to grow, because I will have more and upstream servers.

Do you see any technique to improve this?

The only solution I have found for now is the paid version of Nginx Nginx Plus API https://docs.nginx.com/nginx/admin-guide/load-balancer/dynamic-configuration-api/ where you can add new upstream severs dynamically with a REST API without any reload.

Also I was thinking to have a kind sharding technique with one master wirth hash keys to slave servers (like elasticsearch with the RAFT algorithm to keep the consensus state) so when you need to update you have only reload one slave server with less upstream servers.

Best Answer

How many files, and what sort of configuration, do you have that nginx -s reload is taking a whole one minute?!

Identify the Source.

I think you have to figure out why it's taking so long in the first place, before being capable of coming up with a solution to address it.

Filesystem issues?

  • Is it a ridiculous number of individual files that slow down the process?

    E.g., does doing a cat myservers/*.conf | md5 take a whole minute by itself?

    If so, you might want to look into using a ramdisk for your configuration; or into keeping the individual configurations in a database, and having a single nginx.conf for reload purposes.

Configuration directive issues?

  • Is it the contents of the configuration files that take a really long time to reload?

    There could be more than one way this could be a problem.

    For example, maybe one of your configurations is using a domain name that takes a long time to resolve (through a timeout), slowing down the whole reload. This is potentially a security vulnerability in your setup, as a single user might be able to slow down your whole reload sequence given the "right" input.

    This could also be another issue with the configuration, maybe when too many individual log files have to be closed/opened. You can look more into this with tools like lsof and/or fstat, to see the number of open files that your applications take.

Is this even a real problem?

  • As others pointed out, even as-is, this is already not a huge problem, because nginx -s reload is a graceful reload of the configuration, where nginx should still remain fully functional, even when you're reloading its configuration.

    I would say it should be totally reasonable to architect the reloads into batches, and perform the reloads once every 5 to 15 minutes. If you're dealing with new domain names, you probably already have to wait until the configuration starts working on the DNS level. A delay of up to 1 minute is not at all unreasonable, and is very often implemented in production services of various cloud providers to this day. In fact, DNS root zones are often updated in batch mode as well, often on a schedule much less frequent than once every 15 minutes, especially given the sheer volume of data that's involved; for example, .ru gets refreshed only 4 times per day, as it has 5 million records, and is mirrored by several separate providers for redundancy, with each update taking up to 30 minutes, hence they have to be spaced apart to ensure a reasonable level of consistency, and to ensure that the separate updates don't run into each other.

    If you require to have the changes have immediate effect, then perhaps a different architecture is required; maybe one where a separate staging area is provided for the testing of configuration, or a multilayer approach, or a commercial version of nginx, and/or third-party plugins.


Come up with a Solution.

Depending on the source of the issue, the solution would be to re-architect the way you're doing the configuration.

Without knowing the source of the issue, the question is just too broad to offer any specific advice.