Add contributor doc for resize and cold migrate

Resize and cold migrate can be confusing since they are
very similar operations and share mostly the same code paths
but there are some notable differences. This adds a contributor
doc, similar to the evacuate-vs-rebuild contributor doc, to
try and explain things at a high level and provide enough links
so contributors can follow along and see where things fit into
the puzzle.

A sequence diagram is sorely needed for this as well but that will
come in a separate change.

Change-Id: I11b401a3f874226fdc20a0ee0bd518192f70fa1d
This commit is contained in:
Matt Riedemann 2019-11-21 19:43:25 -05:00
parent 1cd5563f2d
commit 1e781f6f34
2 changed files with 134 additions and 0 deletions

View File

@ -162,6 +162,8 @@ diving in.
* :doc:`/contributor/evacuate-vs-rebuild`: Describes the differences between
the often-confused evacuate and rebuild operations.
* :doc:`/contributor/resize-and-cold-migrate`: Describes the differences and
similarities between resize and cold migrate operations.
.. # NOTE(amotoki): toctree needs to be placed at the end of the secion to
# keep the document structure in the PDF doc.
@ -169,3 +171,4 @@ diving in.
:hidden:
evacuate-vs-rebuild
resize-and-cold-migrate

View File

@ -0,0 +1,131 @@
=======================
Resize and cold migrate
=======================
The `resize API`_ and `cold migrate API`_ are commonly confused in nova because
the internal `API code`_, `conductor code`_ and `compute code`_ use the same
methods. This document explains some of the differences in what
happens between a resize and cold migrate operation.
High level
~~~~~~~~~~
:doc:`Cold migrate </admin/migration>` is an operation performed by an
administrator to power off and move a server from one host to a **different**
host using the **same** flavor. Volumes and network interfaces are disconnected
from the source host and connected on the destination host. The type of file
system between the hosts and image backend determine if the server files and
disks have to be copied. If copy is necessary then root and ephemeral disks are
copied and swap disks are re-created.
:doc:`Resize </user/resize>` is an operation which can be performed by a
non-administrative owner of the server (the user) with a **different** flavor.
The new flavor can change certain aspects of the server such as the number of
CPUS, RAM and disk size. Otherwise for the most part the internal details are
the same as a cold migration.
Scheduling
~~~~~~~~~~
Depending on how the API is configured for
:oslo.config:option:`allow_resize_to_same_host`, the server may be able to be
resized on the current host. *All* compute drivers support *resizing* to the
same host but *only* the vCenter driver supports *cold migrating* to the same
host. Enabling resize to the same host is necessary for features such as
strict affinity server groups where there are more than one server in the same
affinity group.
Starting with `microversion 2.56`_ an administrator can specify a destination
host for the cold migrate operation. Resize does not allow specifying a
destination host.
Flavor
~~~~~~
As noted above, with resize the flavor *must* change and with cold migrate the
flavor *will not* change.
Resource claims
~~~~~~~~~~~~~~~
Both resize and cold migration perform a `resize claim`_ on the destination
node. Historically the resize claim was meant as a safety check on the selected
node to work around race conditions in the scheduler. Since the scheduler
started `atomically claiming`_ VCPU, MEMORY_MB and DISK_GB allocations using
Placement the role of the resize claim has been reduced to detecting the same
conditions but for resources like PCI devices and NUMA topology which, at least
as of the 20.0.0 (Train) release, are not modeled in Placement and as such are
not atomic.
If this claim fails, the operation can be rescheduled to an alternative
host, if there are any. The number of possible alternative hosts is determined
by the :oslo.config:option:`scheduler.max_attempts` configuration option.
Allocations
~~~~~~~~~~~
Since the 16.0.0 (Pike) release, the scheduler uses the `placement service`_
to filter compute nodes (resource providers) based on information in the flavor
and image used to build the server. Once the scheduler runs through its filters
and weighers and picks a host, resource class `allocations`_ are atomically
consumed in placement with the server as the consumer.
During both resize and cold migrate operations, the allocations held by the
server consumer against the source compute node resource provider are `moved`_
to a `migration record`_ and the scheduler will create allocations, held by the
instance consumer, on the selected destination compute node resource provider.
This is commonly referred to as `migration-based allocations`_ which were
introduced in the 17.0.0 (Queens) release.
If the operation is successful and confirmed, the source node allocations held
by the migration record are `dropped`_. If the operation fails or is reverted,
the source compute node resource provider allocations held by the migration
record are `reverted`_ back to the instance consumer and the allocations
against the destination compute node resource provider are dropped.
Summary of differences
~~~~~~~~~~~~~~~~~~~~~~
.. list-table::
:header-rows: 1
* -
- Resize
- Cold migrate
* - New flavor
- Yes
- No
* - Authorization (default)
- Admin or owner (user)
Policy rule: ``os_compute_api:servers:resize``
- Admin only
Policy rule: ``os_compute_api:os-migrate-server:migrate``
* - Same host
- Maybe
- Only vCenter
* - Can specify target host
- No
- Yes (microversion >= 2.56)
Sequence Diagram
~~~~~~~~~~~~~~~~
.. todo:: Add something like the :doc:`/reference/live-migration` diagram.
.. _resize API: https://docs.openstack.org/api-ref/compute/#resize-server-resize-action
.. _cold migrate API: https://docs.openstack.org/api-ref/compute/#migrate-server-migrate-action
.. _API code: https://opendev.org/openstack/nova/src/tag/19.0.0/nova/compute/api.py#L3568
.. _conductor code: https://opendev.org/openstack/nova/src/tag/19.0.0/nova/conductor/manager.py#L297
.. _compute code: https://opendev.org/openstack/nova/src/tag/19.0.0/nova/compute/manager.py#L4445
.. _microversion 2.56: https://docs.openstack.org/nova/latest/reference/api-microversion-history.html#id52
.. _resize claim: https://opendev.org/openstack/nova/src/tag/19.0.0/nova/compute/resource_tracker.py#L248
.. _atomically claiming: https://opendev.org/openstack/nova/src/tag/19.0.0/nova/scheduler/filter_scheduler.py#L239
.. _moved: https://opendev.org/openstack/nova/src/tag/19.0.0/nova/conductor/tasks/migrate.py#L28
.. _placement service: https://docs.openstack.org/placement/latest/
.. _allocations: https://docs.openstack.org/api-ref/placement/#allocations
.. _migration record: https://docs.openstack.org/api-ref/compute/#migrations-os-migrations
.. _migration-based allocations: https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/migration-allocations.html
.. _dropped: https://opendev.org/openstack/nova/src/tag/19.0.0/nova/compute/manager.py#L4048
.. _reverted: https://opendev.org/openstack/nova/src/tag/19.0.0/nova/compute/manager.py#L4233