Optimizing Ansible playbooks to run against many hosts

ansible

I'm running Ansible 2.0 on SLES 11 SP4 against about 430 machines and it is very slow, I can't really tell why it is so slow, but it goes much faster if I limit the number of machines in the inventory. It took about 7 hours to run a 3 task playbook (including gathering facts) and the 3rd task was a local action. It takes about as much time to gather 2 machines facts files when I'm running inventory of all 430 as it does to fully process 6 machines.

And it uses 99.9% of the CPU right off the bat:

root     11646 99.8  0.4 220188 61016 pts/1    Rl+  07:24   6:41                          \_ /usr/bin/python /usr/bin/ansible-playbook /etc/ansible/playbooks/checkhostnames.yml ...
root     11651  0.1  0.4 187396 58828 pts/1    Sl+  07:24   0:00                          \_ /usr/bin/python /usr/bin/ansible-playbook /etc/ansible/playbooks/checkhostnames.yml ...
root     11652  0.1  0.4 187812 59216 pts/1    Sl+  07:24   0:00                          \_ /usr/bin/python /usr/bin/ansible-playbook /etc/ansible/playbooks/checkhostnames.yml ...
root     11653  0.1  0.4 188052 59428 pts/1    Sl+  07:24   0:00                          \_ /usr/bin/python /usr/bin/ansible-playbook /etc/ansible/playbooks/checkhostnames.yml ...
root     11654  0.1  0.4 186148 57496 pts/1    Sl+  07:24   0:00                          \_ /usr/bin/python /usr/bin/ansible-playbook /etc/ansible/playbooks/checkhostnames.yml ...
root     11655  0.1  0.4 186552 57924 pts/1    Sl+  07:24   0:00                          \_ /usr/bin/python /usr/bin/ansible-playbook /etc/ansible/playbooks/checkhostnames.yml ...
root     11656  0.4  0.2 154948 25828 pts/1    Sl+  07:24   0:01                          \_ /usr/bin/python /usr/bin/ansible-playbook /etc/ansible/playbooks/checkhostnames.yml ...

Which is scary since I was really hoping that this would optimize our serialized ssh processes, looks like it's just gonna suck up all the resources.

when I strace the main pid, it just appears to be running stat on the inventory file over and over and over again.

I'm keeping all my host vars in one inventory file that I generate from a database. I tried using a dynamic inventory, but that took too long to even initialize (I'm guessing it's hitting the sql query over and over again)

So, is there a trick to running it against lots of machines?

I have already tried all the tricks in https://www.ansible.com/blog/ansible-performance-tuning

I've also tried breaking it up by putting host_vars for each host in their own file – I figured strace was telling me that it was parsing my 500k inventory file constantly. But that didn't help too much.

I switched my playbook to just echo hello, no gathering facts

when I run an inventory file with only 3 hosts in it I get

real    0m1.996s
user    0m0.400s
sys     0m0.112s

when I run an inventory file with all 430 hosts and limit to just the first 3 I get it done in (note, these are different hosts – but the same make of machine):

real    0m11.989s
user    0m13.693s
sys     0m0.552s

and when I run an inventory file with all 430 hosts with no limit (and ctrl-c after the 3rd one, I get:

real    2m50.961s
user    2m56.495s
sys     0m0.764s

So, it makes me think that not a lot is really going on behind the scenes and something is intensely blocking.

Best Answer

First of all, you need to consider caching the facts.

Take a look here for how to:

http://docs.ansible.com/ansible/playbooks_variables.html#fact-caching

You will see an amazing performance on gather-facts, even with caching to a file.

Then you may consider of improving the level of parallelism with -f

man ansible-playbook

   -f NUM, --forks=NUM
       Level of parallelism.  NUM is specified as an integer, the default is 5.

to something bigger than 5

Related Solutions

Security – How to implement ansible with per-host passwords, securely

You've certainly done your research...

From all of my experience with ansible what you're looking to accomplish, isn't supported. As you mentioned, ansible states that it does not require passwordless sudo, and you are correct, it does not. But I have yet to see any method of using multiple sudo passwords within ansible, without of course running multiple configs.

So, I can't offer the exact solution you are looking for, but you did ask...

"So... how are people using Ansible in situations like these? Setting NOPASSWD in /etc/sudoers, reusing password across hosts or enabling root SSH login all seem rather drastic reductions in security."

I can give you one view on that. My use case is 1k nodes in multiple data centers supporting a global SaaS firm in which I have to design/implement some insanely tight security controls due to the nature of our business. Security is always balancing act, more usability less security, this process is no different if you are running 10 servers or 1,000 or 100,000.

You are absolutely correct not to use root logins either via password or ssh keys. In fact, root login should be disabled entirely if the servers have a network cable plugged into them.

Lets talk about password reuse, in a large enterprise, is it reasonable to ask sysadmins to have different passwords on each node? for a couple nodes, perhaps, but my admins/engineers would mutiny if they had to have different passwords on 1000 nodes. Implementing that would be near impossible as well, each user would have to store there own passwords somewhere, hopefully a keypass, not a spreadsheet. And every time you put a password in a location where it can be pulled out in plain text, you have greatly decreased your security. I would much rather them know, by heart, one or two really strong passwords than have to consult a keypass file every time they needed to log into or invoke sudo on a machine.

So password resuse and standardization is something that is completely acceptable and standard even in a secure environment. Otherwise ldap, keystone, and other directory services wouldn't need to exist.

When we move to automated users, ssh keys work great to get you in, but you still need to get through sudo. Your choices are a standardized password for the automated user (which is acceptable in many cases) or to enable NOPASSWD as you've pointed out. Most automated users only execute a few commands, so it's quite possible and certainly desirable to enable NOPASSWD, but only for pre-approved commands. I'd suggest using your configuration management (ansible in this case) to manage your sudoers file so that you can easily update the password-less commands list.

Now, there are some steps you can take once you start scaling to further isolate risk. While we have 1000 or so nodes, not all of them are 'production' servers, some are test environments, etc. Not all admins can access production servers, those than can though use their same SSO user/pass|key as they would elsewhere. But automated users are a bit more secure, for instance an automated tool that non-production admins can access has a user & credentials that cannot be used in production. If you want to launch ansible on all nodes, you'd have to do it in two batches, once for non-production and once for production.

We also use puppet though, since it's an enforcing configuration management tool, so most changes to all environments would get pushed out through it.

Obviously, if that feature request you cited gets reopened/completed, what you're looking to do would be entirely supported. Even then though, security is a process of risk assessment and compromise. If you only have a few nodes that you can remember the passwords for without resorting to a post-it note, separate passwords would be slightly more secure. But for most of us, it's not a feasible option.

Linux – Pass hosts from master playbook to included playbook

Here's the problem:

- include: setup-common.yml
  vars:
    server: "{{ hostvars['inventory_hostname'] }}"

You do not need to specify this variable at all, nor should you be attempting to use it in the other playbook. By default the playbook will run for all hosts that it specifies. So don't do this, and just write a normal playbook.

Best Answer

Related Solutions

Security – How to implement ansible with per-host passwords, securely

Linux – Pass hosts from master playbook to included playbook

Related Topic