Linux – Puppet sometimes can’t find standard facts like osfamily

debianfacterlinuxpuppetpuppetmaster

Quick brief – for testing purpose, I installed puppet agent on 5 nodes (Debian Squeeze + puppet 2.7.20-1puppetlabs1), and puppet master on 1 server (same version).

On puppetmaster side in every manifest I check if $::osfamily == 'Debian'. Sometimes I also use $::fqdn, and check if it's not empty.

The problem is that every day on random hours I get mail from puppetmaster that he can't compile catalog for one of nodes. For example:

Fri Jan 18 19:18:24 +0100 2013 Puppet (err): Could not retrieve catalog from remote server: Error 400 on SERVER: Not supported osfamily at /etc/puppet/modules/system/manifests/skel.pp:20 on node mynodeX
Fri Jan 18 19:18:24 +0100 2013 Puppet (notice): Using cached catalog
Fri Jan 18 19:18:24 +0100 2013 Puppet (err): Could not retrieve catalog; skipping run

Another example, from puppetmaster logs:

Jan 15 18:58:49 monitor puppet-master[14218]: No fqdn at /etc/puppet/modules/system/manifests/motd.pp:29 on node nodeY

Of course after next puppet agent iteration, everything is fine. I have no idea how to find cause of this issue. Problem is common to all 5 nodes.

I'm 100% sure that it's not related to cron.

Best Answer

I've seen this issue on RedHat/CentOS. The puppet agent on the client machine would run out of file descriptors due to some ruby/puppet bug not closing them. After hitting the 1024 fd limit, it would not be able to run facter anymore, so the facts would be missing.

If subsequent puppet runs from the same process don't fail, it probably is a different problem, but it might be worth checking out. In my case puppet agent would log about not being able to start facter, and in /proc/PIDOFPUPPETD/fd there'd be 1024 file descriptors.