Update rolling upgrade steps from upgrades documentation
Modify rolling upgrade steps from exisitng nova upgrades documentation. Also, add pre-requisites required for the zero downtime upgrade under newly created "Plan your upgrade" section. Change-Id: I4e884ac051614d25258734c35208b3abfe5fd3a4 Co-Authored-By: Sivasathurappan Radhakrishnan <siva.radhakrishnan@intel.com>
This commit is contained in:
parent
3330d490dd
commit
c0e8aa85e7
@ -30,6 +30,110 @@ Once we have introduced the key concepts relating to upgrade, we will
|
||||
introduce the process needed for a no downtime upgrade of nova.
|
||||
|
||||
|
||||
Minimal Downtime Upgrade Process
|
||||
--------------------------------
|
||||
|
||||
|
||||
Plan your upgrade
|
||||
'''''''''''''''''
|
||||
|
||||
* Read and ensure you understand the release notes for the next release.
|
||||
|
||||
* You should ensure all required steps from the previous upgrade have been
|
||||
completed, such as data migrations.
|
||||
|
||||
* Make a backup of your database. Nova does not support downgrading of the
|
||||
database. Hence, in case of upgrade failure, restoring database from backup
|
||||
is the only choice.
|
||||
|
||||
* During upgrade be aware that there will be additional load on nova-conductor.
|
||||
You may find you need to add extra nova-conductor workers to deal with the
|
||||
additional upgrade related load.
|
||||
|
||||
|
||||
Rolling upgrade process
|
||||
'''''''''''''''''''''''
|
||||
|
||||
To reduce downtime, the services can be upgraded in a rolling fashion. It
|
||||
means upgrading a few services at a time. This results in a condition where
|
||||
both old (N) and new (N+1) nova-compute services co-exist for a certain time
|
||||
period. Note that, there is no upgrade of the hypervisor here, this is just
|
||||
upgrading the nova services.
|
||||
|
||||
#. Before maintenance window:
|
||||
|
||||
* Start the process with the controller node. Install the code for the next
|
||||
version of Nova, either in a venv or a separate control plane node,
|
||||
including all the python dependencies.
|
||||
|
||||
* Using the newly installed nova code, run the DB sync.
|
||||
(``nova-manage db sync``; ``nova-manage api_db sync``). These schema
|
||||
change operations should have minimal or no effect on performance, and
|
||||
should not cause any operations to fail.
|
||||
|
||||
* At this point, new columns and tables may exist in the database. These
|
||||
DB schema changes are done in a way that both the N and N+1 release can
|
||||
perform operations against the same schema.
|
||||
|
||||
#. During maintenance window:
|
||||
|
||||
* For maximum safety (no failed API operations), gracefully shutdown all
|
||||
the services (i.e. SIG_TERM) except nova-compute.
|
||||
|
||||
* Start all services, with ``[upgrade_levels]compute=auto`` in nova.conf.
|
||||
It is safest to start nova-conductor first and nova-api last. Note that
|
||||
for older releases, before Liberty, you need to use a static alias name
|
||||
instead of ``auto``, such as ``[upgrade_levels]compute=mitaka``
|
||||
|
||||
* In small batches gracefully shutdown nova-compute (i.e. SIG_TERM), then
|
||||
start the new version of the code with: ``[upgrade_levels]compute=auto``
|
||||
Note this is done in batches so only a few compute nodes will have any
|
||||
delayed API actions, and to ensure there is enough capacity online to
|
||||
service any boot requests that happen during this time.
|
||||
|
||||
#. After maintenance window:
|
||||
|
||||
* Once all services are running the new code, double check in the DB that
|
||||
there are no old orphaned service records.
|
||||
|
||||
* Now all services are upgraded, we need to send the SIG_HUP signal, so all
|
||||
the services clear any cached service version data. When a new service
|
||||
starts, it automatically detects which version of the compute RPC protocol
|
||||
to use, and it can decide if it is safe to do any online data migrations.
|
||||
Note, if you used a static value for the upgrade_level, such as
|
||||
``[upgrade_levels]compute=mitaka``, you must update nova.conf to remove
|
||||
that configuration value before you send the SIG_HUP signal.
|
||||
|
||||
* Now all the services are upgraded, the system is able to use the latest
|
||||
version of the RPC protocol and so get access to all the new features in
|
||||
the new release.
|
||||
|
||||
* Now all the services are running the latest version of the code, and all
|
||||
the services are aware they all have been upgraded, it is safe to
|
||||
transform the data in the database into its new format. While this
|
||||
happens on demand when the system reads a database row that needs
|
||||
updating, we must get all the data transformed into the current version
|
||||
before we next upgrade.
|
||||
|
||||
* This process can also put significant extra write load on the database.
|
||||
Complete all online data migrations using:
|
||||
``nova-manage db online_data_migrations --limit <number>``. Note that you
|
||||
can use the limit argument to reduce the load this operation will place
|
||||
on the database.
|
||||
|
||||
* The limit argument in online data migrations allows you to run a small
|
||||
chunk of upgrades until all of the work is done. Each time it is run, it
|
||||
will show summary of completed and remaining records. You run this command
|
||||
until you see completed and remaining records as zeros. The size of
|
||||
chunks you should use depend on your infrastructure.
|
||||
|
||||
* At this point, you must also ensure you update the configuration, to stop
|
||||
using any deprecated features or options, and perform any required work
|
||||
to transition to alternative features. All the deprecated options should
|
||||
be supported for one cycle, but should be removed before your next
|
||||
upgrade is performed.
|
||||
|
||||
|
||||
Current Database Upgrade Types
|
||||
------------------------------
|
||||
|
||||
@ -199,41 +303,6 @@ nova-conductor object backports
|
||||
can understand.
|
||||
|
||||
|
||||
Process
|
||||
-------
|
||||
|
||||
NOTE:
|
||||
This still requires much work before it can become reality.
|
||||
This is more an aspirational plan that helps describe how all the
|
||||
pieces of the jigsaw fit together.
|
||||
|
||||
This is the planned process for a zero downtime upgrade:
|
||||
|
||||
#. Prune deleted DB rows, check previous migrations are complete
|
||||
|
||||
#. Expand DB schema (e.g. add new column)
|
||||
|
||||
#. Pin RPC versions for all services that are upgraded from this point,
|
||||
using the current version
|
||||
|
||||
#. Upgrade all nova-conductor nodes (to do object backports)
|
||||
|
||||
#. Upgrade all other services, except nova-compute and nova-api,
|
||||
using graceful shutdown
|
||||
|
||||
#. Upgrade nova-compute nodes (this is the bulk of the work).
|
||||
|
||||
#. Unpin RPC versions
|
||||
|
||||
#. Add new API nodes, and enable new features, while using a load balancer
|
||||
to "drain" the traffic from old API nodes
|
||||
|
||||
#. Run the new nova-manage command that ensures all DB records are "upgraded"
|
||||
to new data version
|
||||
|
||||
#. "Contract" DB schema (e.g. drop unused columns)
|
||||
|
||||
|
||||
Testing
|
||||
-------
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user