diff --git a/doc/source/airship1/airship-1-flow.png b/doc/source/airship1/airship-1-flow.png new file mode 100644 index 0000000..95d54db Binary files /dev/null and b/doc/source/airship1/airship-1-flow.png differ diff --git a/doc/source/airship1/airship-1-flow.rst b/doc/source/airship1/airship-1-flow.rst new file mode 100644 index 0000000..360c259 --- /dev/null +++ b/doc/source/airship1/airship-1-flow.rst @@ -0,0 +1,276 @@ +**************************** +Airship 1.0 Deployment Flows +**************************** + +.. |vspace| raw:: latex + + \vspace{5mm} + +.. image:: airship-1-flow.png + +Airship 1.0 Deploy and Update Site Flow +####################################### + + +1. Pegleg facilitates cloning the repositories necessary to interact with + a site. Each site has a single site-definition.yaml which contains + the repositories that “compose” that site. These may be global + repositories, type level repositories (e.g. cruisers or cloud harbor), + and finally site-level repositories. These may be entirely different + repositories with different permissions. Pegleg facilitates cloning + all of these at the correct revisions according to the definition for + that site. Pegleg can be driven via a jenkins pipeline, which can be + further abstracted in something like an NC3C dashboard, or it can be + driven on the command line directly by imitating the behavior in the + pipeline. + +|vspace| + +2. Pegleg wears several different hats. The CI/CD workflows leverage + different pipelines in order to call upon these hats but under the + hood, it’s really just different command line flags on the pegleg CLI + command depending on what type of action is occurring. Pegleg can: + + a. Generate (and re-generate/rotate) new secure secrets for a site + according to each secret’s requirement (e.g. length, type, and so + on). For instance: UUIDs, passwords, keys, and so on. + + |vspace| + + b. Encrypt secrets, and Decrypt secrets. When secrets are encrypted, + they are wrapped in a YAML envelope containing metadata for each + secret. This allows for understanding when secrets are going to + expire, when they were last rotated, and so on. All deployment and + update pipelines for instance would leverage the decrypt functionality + in order to render the documents successfully. + + |vspace| + + c. Lint the YAML to ensure it is valid and meets certain basic syntax + criteria and deckhand does not have an issue processing rules + encountered. For instance, development gating pipelines that validate + changes to YAML would invoke pegleg in this way. + + |vspace| + + d. Render will actually process the documents through the deckhand + library, which will perform substitutions, pull in the secrets that + are referenced from the configuration YAML so you can see the target + document locally. This is effectively a very in-depth linting process + and again would be used in development gates and potentially to + fast-fail in deployment and update pipelines if there was an issue. + + |vspace| + + e. Collect will bundle up all the documents but not actually render them + which is appropriate for deployment and update pipelines as it sends + the documents through raw (but presumably with decrypted secrets) + because each cloud site has its own deckhand instance running + maintaining its own revision history capable of rendering the + documents in-site. It is used in every deployment and update pipeline + as the results of collect are what is sent to shipyard. + +|vspace| + +3. Once pegleg has decrypted the secrets in the document set within an + ephemeral jenkins pipeline, pegleg collect is called to assemble them + all, and finally that is piped to the shipyard client which will + publish them via REST API to a Shipyard API service running within the + site. There are two scenarios under which Shipyard may be running in + the site. + + a. On the genesis host, which is a single node running Kubernetes in a + green-field site that will be expanded to a full cluster once more + nodes are provisioned. + + |vspace| + + b. On the control plane of a greenfield site, receiving a site-update + or expansion. + + Simply put, the entire Shipyard workflow can be summarized as follows: + + * Initial region/site data will be passed to Shipyard from either a + human operator or Jenkins + * The data (in YAML format) will be sent to Deckhand for validation and storage + * Shipyard will make use of the post-processed data from DeckHand to + interact with Drydock. + * Drydock will interact with Promenade to provision and deploy bare metal + nodes using Ubuntu MAAS and a resilient Kubernetes cluster will be created + at the end of the process + * Once the Kubernetes clusters are up and validated to be working properly, + Shipyard will interact with Armada to deploy OpenStack using OpenStack Helm + * Once the OpenStack cluster is deployed, Shipyard will trigger a workflow to + perform basic sanity health checks on the cluster + +|vspace| + +4. Shipyard will do a number of pre-validations before delivering the + document set to deckhand. Things such as a concurrency check, to + ensure we don’t try to run updates in parallel unaware of each other. + It will also run a number of fail-fast validation checks. + +|vspace| + +5. Shipyard will leverage the deckhand client library to deliver the + documents to deckhand over its REST API, which will again validate + them and render them (which again involves performing all layering, + substitution, secret interpolation, and so on) and publishes a + document revision, so that there is an on-site record of every change + that has ever been requested. This document revision that is fully + rendered will be available at a deckhand REST API URL that can be + retrieved by various Airship sub-components. + +|vspace| + +6. Deckhand will store secrets within Barbican so that they are not + stored in clear text within a database, and the rendered document set + revision itself is stored directly in a database. Deckhand will + change every secret to a Barbican reference which will be rendered + on-demand by Deckhand whenever someone asks for that document revision + through the API. + +|vspace| + +7. At this point, with the documents stored in Deckhand, Shipyard will + perform another fail-fast step and ask each of the components + highlighted in yellow to perform a dry-run no-op validation of the + entire document set from their perspective. This means that Drydock + for instance, would be validating and acknowledging it would not have + any issue processing the document set it sees in Deckhand. This + helps ensure we do not encounter updates that fail in the middle + of the process. If a component is unhappy with the document set + we want to know early and fail before making any changes. + +|vspace| + +8. Shipyard will now invoke Drydock to provision baremetal hosts that + have not already been provisioned and continue to call back or poll + for when Drydock has completed this process. Airship has a concept + called deployment strategies because the hardware aspect of + deployment is not guaranteed or reliable, and we don’t always want + failures here to block every other process in the stack. In other + words, our deployment strategies require that 100% of nodes marked + as control plane nodes must be provisioned successfully to + continue, but that a certain percentage of each rack of workers + could fail and we can still continue past the hardware + provisioning steps successfully. In other words, this is where we + introduce a threshold of failure. + +|vspace| + +9. Shipyard will send Drydock the Deckhand URL to obtain the document set + for itself for this update. Drydock will retrieve the entire document + set from Deckhand but it will only process documents it cares + about. + +|vspace| + +10. Drydock will process any Drydock/BootAction documents that have + external references in them to render those upfront before writing an + operating system to the physical host. Most importantly, this allows + Promenade to construct a host-specific join script. In other words, + Drydock calls out to the Promenade REST API to construct a join shell + script for each host and this is driven by Drydock/Bootaction + documents. + +|vspace| + +11. Drydock will orchestrate MaaS based on the document set. It does this + through several internal tasks, prepare_site, prepare_nodes, and + deploy_site. Within prepare_site, upfront orchestration of MaaS + occurs setting non-host specific settings via the MaaS API, such as + CIDRs, and VLANs. Within prepare_nodes, we identify hosts that + haven’t already been provisioned and then power cycle hosts, wait for + them to be discovered by MaaS, and then aligning and renaming them to + hosts in our static inventory. Then the host configuration is + orchestrated in MaaS so they have the proper networking and storage + configuration as well as receive the correct static overlays, like + Kubernetes join scripts, the correct Drivers, and so on, on + first-boot. Finally within Drydock’s deploy_nodes task we orchestrate + several MaaS flows to actually provision the nodes with an operating + system where they execute any additional static scripts delivered on + first-boot. + +|vspace| + +12. During the deploy_nodes phase of Drydock, MaaS is effectively writing + an operating system to the baremetal nodes. + +|vspace| + +13. Driven by cloud-init on first boot post provision, the nodes will + actually make a rest call back to the MaaS API to inform it that + provisioning has completed and they have successfully booted up into + functional networking and have booted up successfully. Drydock can + use this status within MaaS to understand the nodes were provisioned + successfully. + +|vspace| + +14. The nodes run the Promenade generated shell script to join them to + Kubernetes. This host-specific script installs the appropriate + dependencies and joins the node as a Kubernetes node, either as a + worker, or as a control plane host depending on the hosts profile in + the YAML inventory. + +|vspace| + +15. Shipyard has been polling Drydock for completion of processing the + site update. Once the polling for Drydock provisioning completes, + Shipyard will move on to performing a similar request to Armada. + Armada is asked to update the site and given a Deckhand URL and + revision to pull from. + +|vspace| + +16. Armada pulls the rendered document set from Deckhand. + +|vspace| + +17. Armada then proceeds to help orchestrate any helm installs or upgrades + necessary in the site, and helps do this across a vast number of + charts, their ordering, and dependencies. Armada also supports + fetching Helm chart source and then building charts from source from + various local and remote locations, such as Git endpoints, tarballs or + local directories. It will also give the operator some indication of + what is about to change by assisting with diffs for both values, + values overrides, and actual template changes. Its functionality + extends beyond Helm, assisting in interacting with Kubernetes directly + to perform basic pre- and post-steps, such as removing completed or + failed jobs, running backup jobs, blocking on chart readiness, or + deleting resources that do not support upgrades. However, primarily, + it is an interface to support orchestrating Helm. + +|vspace| + +18. Armada effectively interacts with Tiller for installation (although it + may interact with k8s directly to poll, wait, remove jobs, and + otherwise help protect helm from failures). Tiller will then interact + with k8s to perform helm chart installations or upgrades. + +Airship 1.0 Update Software Flow +################################ + +The Update Software flow (or “action” in Shipyard -- depicted with +green numbers in the image) is effectively a subset of the above flow. +It is used primarily to speed the process up by bypassing the Drydock +flow entirely. The reason for this is both speed as interacting with +MaaS is slow, as well as times where you want to avoid trying to +process hardware requests (e.g. waiting for Drydock to try and +provision a piece of failed hardware only to ultimately timeout some +time later before moving on to the next step because the deployment +strategy allows it). + +Further Documentation +##################### + +* https://airshipit.readthedocs.io/projects/shipyard/en/latest/ +* https://airshipit.readthedocs.io/projects/pegleg/en/latest/ +* https://airshipit.readthedocs.io/projects/armada/en/latest/ +* https://airshipit.readthedocs.io/projects/promenade/en/latest/ +* https://airshipit.readthedocs.io/projects/drydock/en/latest/ +* https://airshipit.readthedocs.io/projects/deckhand/en/latest/ +* https://airshipit.readthedocs.io/en/latest/ + + diff --git a/doc/source/index.rst b/doc/source/index.rst index eb1d436..22e66a1 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -75,6 +75,7 @@ developers. Seaworthy: Production-grade Airship develop/airship1-developers.rst develop/conventions.rst + airship1/airship-1-flow.rst Other Resources ---------------