Merge "Add documentation for safer redeploy_server"

2018-09-11 17:10:45 +00:00 · 2018-09-11 17:10:45 +00:00 · 37429bbbc3
commit 37429bbbc3
parent 9f453dd22b e0320c0197
3 changed files with 105 additions and 0 deletions
--- a/docs/source/_static/shipyard.policy.yaml.sample
+++ b/docs/source/_static/shipyard.policy.yaml.sample
@ -1,6 +1,9 @@
 # Actions requiring admin authority
 #"admin_required": "role:admin"
 # Rule to deny all access. Used for default denial
 #"deny_all": "!"
 # List workflow actions invoked by users
 # GET  /api/v1.0/actions
 #"workflow_orchestrator:list_actions": "rule:admin_required"
--- a/docs/source/action-commands.rst
+++ b/docs/source/action-commands.rst
@ -123,6 +123,9 @@ Like other `target actions` that will use a baremetal or Kubernetes node as
 a target, the `target_nodes` parameter will be used to list the names of the
 nodes that will be acted upon.
 Using redeploy_server
 `````````````````````
 .. danger::
   At this time, there are no safeguards with regard to the running workload
@ -133,6 +136,101 @@ nodes that will be acted upon.
   associated with RBAC rules. A deployment of Shipyard can restrict access
   to this action to help prevent unexpected disaster.
 Redeploying a server can have consequences to the running workload as noted
 above. There are actions that can be taken by a deployment engineer or system
 administrator before performing a redeploy_server to mitigate the risks and
 impact.
 There are three broad categories of nodes that can be considered in regard to
 redeploy_server. It is possible that a node is both a Worker and a Control
 node depending on the deployment of Airship:
 #. Broken Node:
   A non-functional node, e.g. a host that has been corrupted to the point of
   being unable to participate in the Kubernetes cluster.
 #. Worker Node:
   A node that is participating in the Kubernetes cluster not running
   control plane software, but providing capacity for workloads running in
   the environment.
 #. Control Node:
   A node that is participating in the Kubernetes cluster and is hosting
   control plane software. E.g. Airship or other components that serve as
   controllers for the rest of the cluster in some way. These nodes may run
   software such as etcd or databases that contribute to the health of the
   overall Kubernetes cluster.
   Note that there is also the Genesis host, used to bootstrap the Airship
   platform. This node currently runs the Airship containers, including some
   that are not yet able to be migrated to other nodes, e.g. the MAAS rack
   controller, and disruptions arising from moving PostgreSQL.
 .. important::
   Use of redeploy_server on the Airship Genesis host/node is not supported,
   and will result in serious disruption.
 Yes
  Recommended step for this node type
 No
  Generally not necessary for this node type
 N/A
  Not applicable for this node type
 +----------------------------------------+--------+--------+---------+
 | Action                                 | Broken | Worker | Control |
 +========================================+========+========+=========+
 | Coordinate workload impacts with users | Yes    | Yes    | No      |
 | [*]_                                   |        |        |         |
 +----------------------------------------+--------+--------+---------+
 |                                                                    |
 +----------------------------------------+--------+--------+---------+
 | Clear Kubernetes labels from the node  | N/A    | Yes    | Yes     |
 | (for each label)                       |        |        |         |
 +----------------------------------------+--------+--------+---------+
 | ``$ kubectl label nodes <node> <label>-``                          |
 +----------------------------------------+--------+--------+---------+
 | Etcd - check for cluster health        | N/A    | N/A    | Yes     |
 +----------------------------------------+--------+--------+---------+
 | ``$ kubectl -n kube-system exec kubernetes-etcd-<hostname> etcdctl |
 | member list``                                                      |
 +----------------------------------------+--------+--------+---------+
 | Drain Kubernetes node                  | N/A    | Yes    | Yes     |
 +----------------------------------------+--------+--------+---------+
 | ``$ kubectl drain <node>``                                         |
 +----------------------------------------+--------+--------+---------+
 | Disable the kubelet service            | N/A    | Yes    | Yes     |
 +----------------------------------------+--------+--------+---------+
 | ``$ systemctl stop kubelet``                                       |
 |                                                                    |
 | ``$ systemctl disable kubelet``                                    |
 +----------------------------------------+--------+--------+---------+
 | Remove node from Kubernetes            | Yes    | Yes    | Yes     |
 +----------------------------------------+--------+--------+---------+
 | ``$ kubectl delete node <node>``                                   |
 +----------------------------------------+--------+--------+---------+
 | Backup Disks (processes vary) [*]_     | Yes    | Yes    | Yes     |
 +----------------------------------------+--------+--------+---------+
 |                                                                    |
 +----------------------------------------+--------+--------+---------+
 .. [*] Of course it is up to the infrastructure operator if they wish to
   coordinate with their users. This guide assumes client or user
   communication as a common courtesy.
 .. [*] Server redeployment will (quick) erase all disks during the process,
   but desired enhancements to redeploy_server may include options for disk
   handling. Situationally, it may not be necessary to backup disks if the
   underlying implementation already provides the needed resiliency and
   redundancy.
 Future actions
 ~~~~~~~~~~~~~~
--- a/src/bin/shipyard_airflow/etc/shipyard/policy.yaml.sample
+++ b/src/bin/shipyard_airflow/etc/shipyard/policy.yaml.sample
@ -1,6 +1,9 @@
 # Actions requiring admin authority
 #"admin_required": "role:admin"
 # Rule to deny all access. Used for default denial
 #"deny_all": "!"
 # List workflow actions invoked by users
 # GET  /api/v1.0/actions
 #"workflow_orchestrator:list_actions": "rule:admin_required"
@ -78,3 +81,4 @@
 # Create a workflow action to redeploy target servers
 # POST  /api/v1.0/actions
 #"workflow_orchestrator:action_redeploy_server": "rule:admin_required"