Configuration management: push versus pull based topology

ansiblechefconfiguration-managementpuppetsaltstack

The more established configuration management (CM) systems like Puppet and Chef use a pull-based approach: clients poll a centralized master periodically for updates. Some of them offer a masterless approach as well (so, push-based), but state that it is 'not for production' (Saltstack) or 'less scalable' (Puppet). The only system that I know of that is push-based from the start is runner-up Ansible.

What is the specific scalability advantage of a pull based system? Why is it supposedly easier to add more pull-masters than push-agents?

For example, agiletesting.blogspot.nl writes:

in a 'pull' system, clients contact the server independently of each other, so the system as a whole is more scalable than a 'push' system

On the other hand, Rackspace demonstrates that they can handle 15K systems with a push-based model.

infastructures.org writes:

We swear by a pull methodology for maintaining infrastructures, using a tool like SUP, CVSup, an rsync server, or cfengine. Rather than push changes out to clients, each individual client machine needs to be responsible for polling the gold server at boot, and periodically afterwards, to maintain its own rev level.
Before adopting this viewpoint, we developed extensive push-based scripts based on ssh, rsh, rcp, and rdist.
The problem we found with the r-commands (or ssh) was this: When you run an r-command based script to push a change out to your target machines, odds are that if you have more than 30 target hosts one of them will be down at any given time. Maintaining the list of commissioned machines becomes a nightmare.
In the course of writing code to correct for this, you will end up with elaborate wrapper code to deal with: timeouts from dead hosts; logging and retrying dead hosts; forking and running parallel jobs to try to hit many hosts in a reasonable amount of time; and finally detecting and preventing the case of using up all available TCP sockets on the source machine with all of the outbound rsh sessions.
Then you still have the problem of getting whatever you just did into the install images for all new hosts to be installed in the future, as well as repeating it for any hosts that die and have to be rebuilt tomorrow.
After the trouble we went through to implement r-command based replication, we found it's just not worth it. We don't plan on managing an infrastructure with r-commands again, or with any other push mechanism for that matter. They don't scale as well as pull-based methods.

Isn't that an implementation problem instead of an architectural one? Why is it harder to write a threaded push client than a threaded pull server?

Best Answer

The problem with push based systems is that you have to have a complete model of the entire architecture on the central push node. You can't push to a machine that you don't know about.

It can obviously work, but it takes a lot of work to keep it in sync.

Using things like Mcollective, you can convert Puppet and other CM's into a push based system. Generally, it's trivial to convert a pull system to a push based one, but not always simple to go the other way.

There is also the question of organizational politics. A push based system puts all the control hands of the central admins. It can be very hard to manage complexity that way. I think the scaling issue is a red herring, either approach scales if you just look at the number of clients. In many ways push is easier to scale. However, dynamic configuration does more or less imply that you have at least a pull version of client registration.

Ultimately, it's about which system matches the workflow and ownership in your organization. As a general rule, pull systems are more flexible.

Related Solutions

Security – Puppet Security and Network Topologies

Because i sometimes store passwords in variables in my modules, to be able to deploy applications without having to finish configuration manually, it means that i can not decently put my puppet repo on a public server. Doing so would mean that attacking the puppetmaster would permit to gain some app or db passwords of all our different applications on all our servers.

So my puppetmaster is in our office private network, and i do not run puppetd daemon on the servers. When i need to deploy, i use ssh from private net to servers, creating a tunnel and remotely calling puppetd.
The trick is not to set the remote tunnel and puppet client to connect to the puppetmaster, but to a proxy that accept http connect and can reach the puppetmaster server on private network. Otherwise puppet will refuse to pull because of hostname conflict with certificates

# From a machine inside privatenet.net :
ssh -R 3128:httpconnectproxy.privatenet.net:3128 \
    -t remoteclient.publicnetwork.net \
    sudo /usr/sbin/puppetd --server puppetmaster.privatenet.net \
    --http_proxy_host localhost --http_proxy_port 3128 \
    --waitforcert 60 --test –-verbose

It works for me, hopes it helps you

Puppet – What Should Not Be Managed by Puppet?

General rule: If you're using configuration management, manage every aspect of the configuration that you can. The more you centralize the easier it will be to scale your environment out.

Specific examples (cribbed from the question, all "This is why you want to manage it" narratives):

IP Network configuration

OK, sure, you configured an address/gateway/NS on the machine before you dropped it in the rack. I mean if you didn't how would you run puppet to do the rest of the config?

But say now you add another nameserver to your environment and you need to update all your machines -- Don't you want your configuration management system to do that for you?

Or say your company gets acquired, and your new parent company demands that you change from your 192.168.0.0/24 addressing to 10.11.12.0/24 to fit into their numbering system.

Or you suddenly get a massive government contract -- Only catch is you have to turn up IPv6 RIGHT FREAKIN' NOW or the deal is blown....

Looks like network configuration is something we'd like to manage...

IPMI Configuration

Just like with IP addresses, I'm sure you set this up before you put the machine in the rack -- It's just good common sense to enable IPMI, remote console, etc. on any machine that has the capability, and those configurations don't change much...

... Until that hypothetical acquisition I mentioned in IP Configuration above -- The reason you were forced to vacate those 192.168-net addresses is because that's IPMI-land according to your new corporate overlords, and you need to go update all your IPMI cards NOW because they're gonna be trampling on someone's reserved IP space.

OK, it's a bit of a stretch here, but like you said - all of it can be managed with ipmitool, so why not have Puppet run the tool and confirm the configuration while it's doing all of its other stuff? I mean it's not going to hurt anything, so we may as well include IPMI too...

Updates

Software updates are more of a gray area -- In my organization we evaluated puppet for this and found it "sorely lacking", so we use radmind for this purpose. There's no reason Puppet can't call radmind though -- In fact if/when we migrate to Puppet for configuration management that's exactly what's going to happen!

The important thing here is to have all your updates installed in a standard way (either standard across the organization, or standard within platforms) -- There's no reason Puppet shouldn't be launching your update process, as long as you've thoroughly tested everything to ensure that Puppet won't mess up anything.
There's also no reason why Puppet can't call out to a tool that's better suited for this task if you've determined that Puppet can't do a good job on its own...