Linux – How to test that a reboot has completed

foremanlinux

I'm currently building an infrastructure management tool that provisions bare metal and VM's etc. We have a worker VM that runs commands(via ansible) on the remote nodes over SSH.

One of the steps requires the rebooting of nodes to apply some configurations. The worker process has to run more commands on the nodes after the reboot is complete(must be done synchronously).

My question is, how can I check to see whether the reboot has completed?

I could add a sleep timer(to wait until the reboot completes), but I feel like that is a bad solution for a number of reasons.

Another option is to try to SSH to the remote node, from my worker process, every 5 seconds or so, if it fails, keep trying again until I get a successful connection.

Is there another way of doing this?

Best Answer

As you mentioned that you are running commands via ansible, here is what I use for reboots in a playbook (I'm managing Ubuntu 14/16.04 machines):

---
# execute like:
# ansible-playbook reboot.yaml --inventory hosts --extra-vars "hosts=all user=admin"
# or
# ansible-playbook reboot.yaml -i hosts -e "hosts=all user=admin"
- hosts: "{{ hosts }}"
  remote_user: "{{ user }}"
  become: yes
  tasks:
    # add this to to guard you from yourself ;)
    #- name: "ask for verification"
    #  pause:
    #    prompt: "Are you sure you want to restart all specified hosts?"

    # here comes the juicy part
    - name: "reboot hosts"
      shell: "sleep 2 && shutdown -r now 'Reboot triggered by Ansible'" # sleep 2 is needed, else this task might fail
      async: "1" # run asynchronously
      poll: "0" # don't ask for the status of the command, just fire and forget
      ignore_errors: yes # this command will get cut off by the reboot, so ignore errors
    - name: "wait for hosts to come up again"
      wait_for:
        host: "{{ inventory_hostname }}"
        port: "22" # wait for ssh as this is what is needed for ansible
        state: "started"
        delay: "120" # start checking after this amount of time
        timeout: "360" # give up after this amount of time
      delegate_to: "localhost" # check from the machine executing the playbook
...

Update

Ansible 2.7 now has a reboot module, so you don't need to create commands on your own. The playbook from above would translate into this:

---
# execute like:
# ansible-playbook reboot.yaml --inventory hosts --extra-vars "hosts=all user=admin"
# or
# ansible-playbook reboot.yaml -i hosts -e "hosts=all user=admin"
- hosts: "{{ hosts }}"
  remote_user: "{{ user }}"
  become: yes
  tasks:
    # add this to to guard you from yourself ;)
    #- name: "ask for verification"
    #  pause:
    #    prompt: "Are you sure you want to restart all specified hosts?"

    - name: "reboot hosts"
      reboot:
        msg: "Reboot triggered by Ansible"
        reboot_timeout: 360
...