Update rolling upgrade steps from upgrades documentation

Modify rolling upgrade steps from exisitng nova upgrades documentation. Also, add pre-requisites required for the zero downtime upgrade under newly created "Plan your upgrade" section. Change-Id: I4e884ac051614d25258734c35208b3abfe5fd3a4 Co-Authored-By: Sivasathurappan Radhakrishnan <siva.radhakrishnan@intel.com>
2016-09-20 22:25:40 +00:00 · 2016-09-20 22:25:40 +00:00 · c0e8aa85e7
commit c0e8aa85e7
parent 3330d490dd
1 changed files with 104 additions and 35 deletions
--- a/doc/source/upgrade.rst
+++ b/doc/source/upgrade.rst
@ -30,6 +30,110 @@ Once we have introduced the key concepts relating to upgrade, we will
 introduce the process needed for a no downtime upgrade of nova.


+Minimal Downtime Upgrade Process
+--------------------------------
+
+
+Plan your upgrade
+'''''''''''''''''
+
+* Read and ensure you understand the release notes for the next release.
+
+* You should ensure all required steps from the previous upgrade have been
+  completed, such as data migrations.
+
+* Make a backup of your database. Nova does not support downgrading of the
+  database. Hence, in case of upgrade failure, restoring database from backup
+  is the only choice.
+
+* During upgrade be aware that there will be additional load on nova-conductor.
+  You may find you need to add extra nova-conductor workers to deal with the
+  additional upgrade related load.
+
+
+Rolling upgrade process
+'''''''''''''''''''''''
+
+To reduce downtime, the services can be upgraded in a rolling fashion. It
+means upgrading a few services at a time. This results in a condition where
+both old (N) and new (N+1) nova-compute services co-exist for a certain time
+period. Note that, there is no upgrade of the hypervisor here, this is just
+upgrading the nova services.
+
+#. Before maintenance window:
+
+   * Start the process with the controller node. Install the code for the next
+     version of Nova, either in a venv or a separate control plane node,
+     including all the python dependencies.
+
+   * Using the newly installed nova code, run the DB sync.
+     (``nova-manage db sync``; ``nova-manage api_db sync``). These schema
+     change operations should have minimal or no effect on performance, and
+     should not cause any operations to fail.
+
+   * At this point, new columns and tables may exist in the database. These
+     DB schema changes are done in a way that both the N and N+1 release can
+     perform operations against the same schema.
+
+#. During maintenance window:
+
+   * For maximum safety (no failed API operations), gracefully shutdown all
+     the services (i.e. SIG_TERM) except nova-compute.
+
+   * Start all services, with ``[upgrade_levels]compute=auto`` in nova.conf.
+     It is safest to start nova-conductor first and nova-api last. Note that
+     for older releases, before Liberty, you need to use a static alias name
+     instead of ``auto``, such as ``[upgrade_levels]compute=mitaka``
+
+   * In small batches gracefully shutdown nova-compute (i.e. SIG_TERM), then
+     start the new version of the code with: ``[upgrade_levels]compute=auto``
+     Note this is done in batches so only a few compute nodes will have any
+     delayed API actions, and to ensure there is enough capacity online to
+     service any boot requests that happen during this time.
+
+#. After maintenance window:
+
+   * Once all services are running the new code, double check in the DB that
+     there are no old orphaned service records.
+
+   * Now all services are upgraded, we need to send the SIG_HUP signal, so all
+     the services clear any cached service version data. When a new service
+     starts, it automatically detects which version of the compute RPC protocol
+     to use, and it can decide if it is safe to do any online data migrations.
+     Note, if you used a static value for the upgrade_level, such as
+     ``[upgrade_levels]compute=mitaka``, you must update nova.conf to remove
+     that configuration value before you send the SIG_HUP signal.
+
+   * Now all the services are upgraded, the system is able to use the latest
+     version of the RPC protocol and so get access to all the new features in
+     the new release.
+
+   * Now all the services are running the latest version of the code, and all
+     the services are aware they all have been upgraded, it is safe to
+     transform the data in the database into its new format. While this
+     happens on demand when the system reads a database row that needs
+     updating, we must get all the data transformed into the current version
+     before we next upgrade.
+
+   * This process can also put significant extra write load on the database.
+     Complete all online data migrations using:
+     ``nova-manage db online_data_migrations --limit <number>``. Note that you
+     can use the limit argument to reduce the load this operation will place
+     on the database.
+
+   * The limit argument in online data migrations allows you to run a small
+     chunk of upgrades until all of the work is done. Each time it is run, it
+     will show summary of completed and remaining records. You run this command
+     until you see completed and remaining records as zeros. The size of
+     chunks you should use depend on your infrastructure.
+
+   * At this point, you must also ensure you update the configuration, to stop
+     using any deprecated features or options, and perform any required work
+     to transition to alternative features. All the deprecated options should
+     be supported for one cycle, but should be removed before your next
+     upgrade is performed.
+
+
 Current Database Upgrade Types
 ------------------------------

@ -199,41 +303,6 @@ nova-conductor object backports
    can understand.


-Process
-------
-
-NOTE:
-    This still requires much work before it can become reality.
-    This is more an aspirational plan that helps describe how all the
-    pieces of the jigsaw fit together.
-
-This is the planned process for a zero downtime upgrade:
-
-#. Prune deleted DB rows, check previous migrations are complete
-
-#. Expand DB schema (e.g. add new column)
-
-#. Pin RPC versions for all services that are upgraded from this point,
-   using the current version
-
-#. Upgrade all nova-conductor nodes (to do object backports)
-
-#. Upgrade all other services, except nova-compute and nova-api,
-   using graceful shutdown
-
-#. Upgrade nova-compute nodes (this is the bulk of the work).
-
-#. Unpin RPC versions
-
-#. Add new API nodes, and enable new features, while using a load balancer
-   to "drain" the traffic from old API nodes
-
-#. Run the new nova-manage command that ensures all DB records are "upgraded"
-   to new data version
-
-#. "Contract" DB schema (e.g. drop unused columns)
-
-
 Testing
 -------