I am looking to write a playbook that updates and reboots each system one at a time. It need to verify that the node comes back healthy before starting on the next one. If it fails it should stop altogether.
I think I can just wait for a reboot and then check a condition before starting on the next one. Has anyone done this before? I think it is doable but I am curious on the thoughts of others.
For context I am looking to not kill my kubernetes cluster.
FWIW. We patch our hosts one at a time with a combination of bash-scripts and Ansible