Greetings
I wonder how the Fabric Controller decides when an upgrade is concluded and when to move on with upgrade domain walking in case of ordinary Linux Virtual Machines. For deployed packages (I guess that's the whole .NET stuff) one can apparently define checks and influence the upgrade domain walking.
Some background:
In a CoreOS cluster with 4 of 6 simple majority quorum, I need to make sure that at least 4 nodes are available at all times. I need to reserve one node for failure situations thus only one other node is left to be shut down/rebooted/upgraded/etc. at any given
time. I can put all nodes into different upgrade domains, but I also need to wait for the rebooted node to successfully join the cluster before the upgrade domain walking can continue.
I can imagine that I could withhold ICMP Ping responses as long as the node has not rejoined the cluster, if that's the criteria the Fabric Controller is checking for. That's a rather pragmatic than an elegant solution. A better way would be to be able to configure
e.g. an HTTP GET 200 healthcheck.
Matt