1. Overview
Ansible is a simple configuration management tool that automates application deployment, intra-service orchestration, and cloud provisioning, all in one. In this tutorial, we’ll discuss a use case comprising how to pause a playbook and restart a server in Ansible.
Afterward, we’ll see a practical example that will help us understand how Ansible can solve a variety of real-world challenges — while saving a ton of valuable time.
2. Server Restart Options in Ansible
Ansible uses a playbook to write automation jobs in YAML (Yet Another Markup Language), a simple language that is easy to understand, read, and write. We’ll use Ansible to discuss rebooting nodes or servers by temporarily pausing the playbook for a given amount of time before continuing with its execution.
2.1. Server Reboot vs. Restart
Ansible can be used to control our system and its resources. Among carrying out other basic functions, we can use it to reboot our system. For this, we can use the Ansible reboot module. In Ansible versions greater than or equal to 2.7, we can use the built-in reboot module:
- name: Wait for server to restart
reboot:
reboot_timeout: 3600
To restart a server in Ansible, we need to define a block of code and wait until the host comes back:
- name: restart server
shell: 'sleep 1 && shutdown -r now "Reboot triggered by Ansible" && sleep 1'
async: 1
poll: 0
become: true
2.2. Server Restart as a Task
An Ansible playbook executes part of its overall goal by running one or more tasks as an ordered list. The task here is to call an Ansible module to restart a server:
tasks:
- name: restart server
shell: 'sleep 1 && shutdown -r now "Reboot triggered by Ansible" && sleep 1'
async: 1
poll: 0
ignore_errors: true
become: true
This runs the shell command as an asynchronous task, so Ansible will not wait for the end of the command. The sleep before and after shutdown is there to prevent breaking the SSH connection during restart while Ansible is still connected to the remote host.
If we want to run multiple tasks in a playbook concurrently, we can use async with a poll set to zero. When we set poll: 0, Ansible starts the task and immediately moves on to the next task without waiting for a result. Each async task runs until it either completes, fails, or times out by running longer than its async value.
2.3. Wait for Server Restart as a Task
Using Ansible’s wait_for module, we can temporarily stop running the playbook while we wait for the server to finish rebooting or for a service to start and bind to a port:
tasks
- name: Wait for server to restart
local_action:
module: wait_for
host={{ inventory_hostname }}
port=22
delay=10
become: false
This will run the wait_for task on the machine running Ansible. This task will wait for port 22 to become open on the remote host, starting after ten seconds delay. We can also use the same module to wait for a port to become available. It proves to be useful in situations where services are not immediately available after their init scripts finish.
We may prefer to use the {{ ansible_ssh_host }} variable as the hostname and/or {{ ansible_ssh_port }} as the SSH port if we use entries like:
ansible_ssh_host:some.other.name.com
ansible_ssh_port:2222
in the inventory (Ansible hosts file). Here’s a basic inventory file in YAML format:
all:
hosts:
mail.example.com:
children:
webservers:
hosts:
foo.example.com:
bar.example.com:
dbservers:
hosts:
one.example.com:
two.example.com:
three.example.com:
2.4. Server Restart with Wait Using Handlers
Sometimes, we may want a task to run only when a change is made on a machine. Ansible uses handlers to address this use case. In short, handlers are tasks that run only when notified. Although using or not using handlers is conditional, it is advised to define and run tasks as handlers. There are two main reasons to do this:
- Code reuse: We can use a handler for many tasks. For example, we can trigger a server restart after changing the timezone and after changing the kernel.
- Trigger only once: If we use a handler for a few tasks, and more than one of them will make some change, then the thing that the handler does will happen only once. For example, if we have an httpd restart handler attached to httpd config change and SSL certificate update, then although both the config and SSL certificate change, httpd will be restarted only once.
Now, we’d run “Restart server and wait for the server to restart” as handlers. When we do so, we use both of these as handlers, not tasks. Let’s take a look at the YAML snippet for restarting and waiting for the restart using handlers:
handlers:
- name: Restart server
command: 'sleep 1 && shutdown -r now "Reboot triggered by Ansible" && sleep 1'
async: 1
poll: 0
ignore_errors: true
become: true
- name: Wait for server to restart
local_action:
module: wait_for
host={{ inventory_hostname }}
port=22
delay=10
become: false
And let’s use it in our task in a sequence, thereupon paired with rebooting the server handler:
tasks:
- name: Set hostname
hostname: name=somename
notify:
- Restart server
- Wait for server to restart
*It’s noteworthy that handlers are run in the order they are defined, not the order they are listed in notify!*
3. Problem Conceptualization
We’ll now discuss rebooting servers while waiting for a given amount of time for a given service on a given port to start. Then, we’ll propose a module with a generic structure using all those Ansible concepts we’ve discussed so far. We’ll simultaneously discuss the functionality, implementation, and any relatable exceptions. For easier conceptualization, we’ll break our problem into four parts.
3.1. Pre-Reboot
The pre-restart includes running our pre-reboot task, which can be performing major upgrades and/or doing some configuration changes that only take effect at boot time. For example, we might upgrade all packages using the yum module:
- name: upgrade all packages
yum: name=* state=latest
3.2. Reboot
In this stage, we’ll use the command module to reboot the remote machine or server by running the reboot command — nothing fancy — we can also use shutdown –reboot:
- name: reboot server
command: /sbin/reboot
3.3. Pause and Resume the Playbook
Next, we’ll use the wait_for module to wait for 300 seconds for port 22 to become available before resuming the playbook. We’re using port 22 because most servers run OpenSSH-server on port 22, and if we were to telnet to that port, we’d probably see something like: SSH-2.0-OpenSSH_6.6.1. So, we can use regex to match output with “OpenSSH”.
We’re using a timeout value of 300 seconds because most physical servers take three to five minutes to finish reboot due to hardware checks. But, we can use whatever value suits us. For example, we can tell it to wait for 300 seconds for port 22 to become available and contain OpenSSH:
- name: wait for the server to finish rebooting
local_action:
module: wait_for
host=“web01”
search_regex=OpenSSH
port=22
timeout=300
After we’ve got a response from port 22, we can resume running the playbook. This step is optional.
4. Putting It All Together
We can merge all the above sections into one playbook:
- hosts: all
sudo: yes
tasks:
- name: Upgrade all packages in RedHat-based machines
when: ansible_os_family == "Redhat"
yum: name=* state=latest
- name: Upgrade all packages in Debian-based machines
when: ansible_os_family == "Debian"
apt: upgrade=dist update_cache=yes
- name: Reboot server
command: /sbin/reboot
- name: Wait for the server to finish rebooting
sudo: no
local_action:
module: wait_for
host="{{ inventory_hostname }}"
search_regex=OpenSSH
port=22
delay=1
timeout=300
The variable inventory_hostname is the name of the remote server stated in the ansible hosts file. The wait_for local_action directive runs the given step on the local machine. Because the yum module only works on RedHat-based OS such as Fedora, CentOS, and RHEL, we’ll use the apt module for Debian-based OS like Ubuntu and Debian.
Ever wondered why we didn’t use handlers here? Well, notify tasks are only executed at the end of the playbook regardless of their location in the playbook. Needless to say that in this use case, we’re only interested in rebooting the server and waiting for a given amount of time for the server to finish rebooting.
5. Conclusion
In this tutorial, we learned how to take full advantage of Ansible by building thoughtfully designed tasks. This allows us to do lots of platform-specific tweaks to make it behave as we want. The concept is life-changing for things like SELinux changes — particularly tasks that include restarting a server.