Salt minions keep losing connection to master

deploymentsaltstackvirtualboxvirtualization

I'm testing Salt. I have a simple test setup of 3 VirtualBox VM's — with salt-master running on one of the machines and salt-minions running on the other 2 VM's.

I can start either of the salt minion VM's and they will connect to the master and receive commands. If I start both minion VM's, they will both connect for a short period of time, and then one will drop and show as not connected from the master.

Actually, I don't even need to have more than one VM client active. With 1 VM client and 1 VM salt-minion, it will disconnect.

I can restart the salt-minion and it will reconnect to the master and receive commands again… for a couple of minutes at least. Eventually, it will show as disconnected on the master. Running the salt-minion in debug doesn't appear to show anything that explains why it's showing as disconnected on the master.

What could be causing this?

Edit:

The OS I'm using is Ubuntu 14.04. The master and minion environments are the same except for the salt-master package. Running --versions-report on master and minion gives the following versions:

             Salt: 2015.5.3
            Python: 2.7.6 (default, Mar 22 2014, 22:59:56)
            Jinja2: 2.7.2
          M2Crypto: 0.21.1
    msgpack-python: 0.3.0
      msgpack-pure: Not Installed
          pycrypto: 2.6.1
           libnacl: Not Installed
            PyYAML: 3.10
             ioflo: Not Installed
             PyZMQ: 14.0.1
              RAET: Not Installed
               ZMQ: 4.0.4
              Mako: Not Installed
           Tornado: Not Installed
Debian source package: 2015.5.3+ds-1trusty1

Best Answer

Connectivity issues are usually caused by the ZMQ library (less than 4.X.X) and/or salt version . Pleas run salt --versions-report on master and salt-call --versions-report in order to see what versions you are using. You should be running:

Salt: 2015.5.3
...
ZMQ: 4.0.5

You should also try to reproduce the issue with a simple vagrant-salt demo. Notice that you will need to change the salt versions in the vagrant file to "2015.5.3"

You haven't specified what OSes or Salt version you are using but there is ongoing issue with the zmq package used by salt that causes slow connections and drops. It is highly recommended to upgrade the zmq package: (this is redhat based sls file)

{% if grains['os'] in ('RedHat', 'CentOS', 'Fedora') %}
  {% if grains['os'] == 'Fedora' %}
    {% set repotype = 'fedora' %}
  {% else %}
    {% set repotype = 'epel' %}
  {% endif %}
saltstack-zeromq4:
  pkgrepo.managed:
    - humanname: Copr repo for zeromq4 owned by saltstack
    - baseurl: http://copr-be.cloud.fedoraproject.org/results/saltstack/zeromq4/{{ repotype }}-$releasever-$basearch/
    - gpgcheck: 0
    - skip_if_unavailable: True
    - enabled: 1
{% endif %}

{% if grains['os'] in ('RedHat', 'CentOS', 'Fedora') %}
update_zmq:
  pkg:
    - latest
    - pkgs:
      - zeromq
      - python-zmq
    - order: last
  cmd:
    - wait
    - name: echo service salt-minion restart | at now + 1 minute
    - watch:
      - pkg: update_zmq
{% endif %}

Another "hack" is to ping the machines every minute or so, just add this to the salt-master minion config:

"salt '*' test.ping > /dev/null":
  cron.present:
    - user: root
    - minute: '*/1'

You can also ping the master from the minion by setting the master_alive_interval option in the minion config file.