nova/doc/source/contributor/resize-and-cold-migrate.rst
Sylvain Bauza cc25813b6d doc: drop blockdiag usage
Since blockdiag seems a bit unmaintenained, let's just statically
generate the SVGs but let's keep the source files in tree so we can
modify the diagrams whenever we want, provided blockdiag exists in
a foreseenable future :-)

Closes-Bug: #2026345

Change-Id: I1cc078554ab149a9849c895e08c878180b7510b0
2024-01-22 18:27:04 +01:00

168 lines
6.9 KiB
ReStructuredText

=======================
Resize and cold migrate
=======================
The `resize API`_ and `cold migrate API`_ are commonly confused in nova because
the internal `API code`_, `conductor code`_ and `compute code`_ use the same
methods. This document explains some of the differences in what
happens between a resize and cold migrate operation.
For the most part this document describes
:term:`same-cell resize <Same-Cell Resize>`.
For details on :term:`cross-cell resize <Cross-Cell Resize>`, refer to
:doc:`/admin/configuration/cross-cell-resize`.
High level
~~~~~~~~~~
:doc:`Cold migrate </admin/migration>` is an operation performed by an
administrator to power off and move a server from one host to a **different**
host using the **same** flavor. Volumes and network interfaces are disconnected
from the source host and connected on the destination host. The type of file
system between the hosts and image backend determine if the server files and
disks have to be copied. If copy is necessary then root and ephemeral disks are
copied and swap disks are re-created.
:doc:`Resize </user/resize>` is an operation which can be performed by a
non-administrative owner of the server (the user) with a **different** flavor.
The new flavor can change certain aspects of the server such as the number of
CPUS, RAM and disk size. Otherwise for the most part the internal details are
the same as a cold migration.
Scheduling
~~~~~~~~~~
Depending on how the API is configured for
:oslo.config:option:`allow_resize_to_same_host`, the server may be able to be
resized on the current host. *All* compute drivers support *resizing* to the
same host but *only* the vCenter driver supports *cold migrating* to the same
host. Enabling resize to the same host is necessary for features such as
strict affinity server groups where there are more than one server in the same
affinity group.
Starting with `microversion 2.56`_ an administrator can specify a destination
host for the cold migrate operation. Resize does not allow specifying a
destination host.
Flavor
~~~~~~
As noted above, with resize the flavor *must* change and with cold migrate the
flavor *will not* change.
Resource claims
~~~~~~~~~~~~~~~
Both resize and cold migration perform a `resize claim`_ on the destination
node. Historically the resize claim was meant as a safety check on the selected
node to work around race conditions in the scheduler. Since the scheduler
started `atomically claiming`_ VCPU, MEMORY_MB and DISK_GB allocations using
Placement the role of the resize claim has been reduced to detecting the same
conditions but for resources like PCI devices and NUMA topology which, at least
as of the 20.0.0 (Train) release, are not modeled in Placement and as such are
not atomic.
If this claim fails, the operation can be rescheduled to an alternative
host, if there are any. The number of possible alternative hosts is determined
by the :oslo.config:option:`scheduler.max_attempts` configuration option.
Allocations
~~~~~~~~~~~
Since the 16.0.0 (Pike) release, the scheduler uses the `placement service`_
to filter compute nodes (resource providers) based on information in the flavor
and image used to build the server. Once the scheduler runs through its filters
and weighers and picks a host, resource class `allocations`_ are atomically
consumed in placement with the server as the consumer.
During both resize and cold migrate operations, the allocations held by the
server consumer against the source compute node resource provider are `moved`_
to a `migration record`_ and the scheduler will create allocations, held by the
instance consumer, on the selected destination compute node resource provider.
This is commonly referred to as `migration-based allocations`_ which were
introduced in the 17.0.0 (Queens) release.
If the operation is successful and confirmed, the source node allocations held
by the migration record are `dropped`_. If the operation fails or is reverted,
the source compute node resource provider allocations held by the migration
record are `reverted`_ back to the instance consumer and the allocations
against the destination compute node resource provider are dropped.
Summary of differences
~~~~~~~~~~~~~~~~~~~~~~
.. list-table::
:header-rows: 1
* -
- Resize
- Cold migrate
* - New flavor
- Yes
- No
* - Authorization (default)
- Admin or owner (user)
Policy rule: ``os_compute_api:servers:resize``
- Admin only
Policy rule: ``os_compute_api:os-migrate-server:migrate``
* - Same host
- Maybe
- Only vCenter
* - Can specify target host
- No
- Yes (microversion >= 2.56)
Sequence Diagrams
~~~~~~~~~~~~~~~~~
The following diagrams are current as of the 21.0.0 (Ussuri) release.
Resize
------
This is the sequence of calls to get the server to ``VERIFY_RESIZE`` status.
.. image:: /_static/images/resize/resize.svg
:alt: Resize standard workflow
Confirm resize
--------------
This is the sequence of calls when confirming `or deleting`_ a server in
``VERIFY_RESIZE`` status.
Note that in the below diagram, if confirming a resize while deleting a server
the API synchronously calls the source compute service.
.. image:: /_static/images/resize/resize_confirm.svg
:alt: Resize confirm workflow
Revert resize
-------------
This is the sequence of calls when reverting a server in ``VERIFY_RESIZE``
status.
.. image:: /_static/images/resize/resize_revert.svg
:alt: Resize revert workflow
.. _resize API: https://docs.openstack.org/api-ref/compute/#resize-server-resize-action
.. _cold migrate API: https://docs.openstack.org/api-ref/compute/#migrate-server-migrate-action
.. _API code: https://opendev.org/openstack/nova/src/tag/19.0.0/nova/compute/api.py#L3568
.. _conductor code: https://opendev.org/openstack/nova/src/tag/19.0.0/nova/conductor/manager.py#L297
.. _compute code: https://opendev.org/openstack/nova/src/tag/19.0.0/nova/compute/manager.py#L4445
.. _microversion 2.56: https://docs.openstack.org/nova/latest/reference/api-microversion-history.html#id52
.. _resize claim: https://opendev.org/openstack/nova/src/tag/19.0.0/nova/compute/resource_tracker.py#L248
.. _atomically claiming: https://opendev.org/openstack/nova/src/tag/19.0.0/nova/scheduler/filter_scheduler.py#L239
.. _moved: https://opendev.org/openstack/nova/src/tag/19.0.0/nova/conductor/tasks/migrate.py#L28
.. _placement service: https://docs.openstack.org/placement/latest/
.. _allocations: https://docs.openstack.org/api-ref/placement/#allocations
.. _migration record: https://docs.openstack.org/api-ref/compute/#migrations-os-migrations
.. _migration-based allocations: https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/migration-allocations.html
.. _dropped: https://opendev.org/openstack/nova/src/tag/19.0.0/nova/compute/manager.py#L4048
.. _reverted: https://opendev.org/openstack/nova/src/tag/19.0.0/nova/compute/manager.py#L4233
.. _or deleting: https://opendev.org/openstack/nova/src/tag/19.0.0/nova/compute/api.py#L2135