Configuration management: push versus pull based topology

ansiblechefconfiguration-managementpuppetsaltstack

The more established configuration management (CM) systems like Puppet and Chef use a pull-based approach: clients poll a centralized master periodically for updates. Some of them offer a masterless approach as well (so, push-based), but state that it is 'not for production' (Saltstack) or 'less scalable' (Puppet). The only system that I know of that is push-based from the start is runner-up Ansible.

What is the specific scalability advantage of a pull based system? Why is it supposedly easier to add more pull-masters than push-agents?

For example, agiletesting.blogspot.nl writes:

in a 'pull' system, clients contact the server independently of each other, so the system as a whole is more scalable than a 'push' system

On the other hand, Rackspace demonstrates that they can handle 15K systems with a push-based model.

infastructures.org writes:

We swear by a pull methodology for maintaining infrastructures, using a tool like SUP, CVSup, an rsync server, or cfengine. Rather than push changes out to clients, each individual client machine needs to be responsible for polling the gold server at boot, and periodically afterwards, to maintain its own rev level.
Before adopting this viewpoint, we developed extensive push-based scripts based on ssh, rsh, rcp, and rdist.
The problem we found with the r-commands (or ssh) was this: When you run an r-command based script to push a change out to your target machines, odds are that if you have more than 30 target hosts one of them will be down at any given time. Maintaining the list of commissioned machines becomes a nightmare.
In the course of writing code to correct for this, you will end up with elaborate wrapper code to deal with: timeouts from dead hosts; logging and retrying dead hosts; forking and running parallel jobs to try to hit many hosts in a reasonable amount of time; and finally detecting and preventing the case of using up all available TCP sockets on the source machine with all of the outbound rsh sessions.
Then you still have the problem of getting whatever you just did into the install images for all new hosts to be installed in the future, as well as repeating it for any hosts that die and have to be rebuilt tomorrow.
After the trouble we went through to implement r-command based replication, we found it's just not worth it. We don't plan on managing an infrastructure with r-commands again, or with any other push mechanism for that matter. They don't scale as well as pull-based methods.

Isn't that an implementation problem instead of an architectural one? Why is it harder to write a threaded push client than a threaded pull server?

Best Answer

The problem with push based systems is that you have to have a complete model of the entire architecture on the central push node. You can't push to a machine that you don't know about.

It can obviously work, but it takes a lot of work to keep it in sync.

Using things like Mcollective, you can convert Puppet and other CM's into a push based system. Generally, it's trivial to convert a pull system to a push based one, but not always simple to go the other way.

There is also the question of organizational politics. A push based system puts all the control hands of the central admins. It can be very hard to manage complexity that way. I think the scaling issue is a red herring, either approach scales if you just look at the number of clients. In many ways push is easier to scale. However, dynamic configuration does more or less imply that you have at least a pull version of client registration.

Ultimately, it's about which system matches the workflow and ownership in your organization. As a general rule, pull systems are more flexible.

Related Topic