Deprecate CONF.workarounds.enable_numa_live_migration
Once a deployment has been fully upgraded to Train, the CONF.workarounds.enable_numa_live_migration config option is no longer necessary. This patch changes the conductor check to only apply if the cell's (cross-cell live migration isn't supported) minimum service version is old. Implements blueprint numa-aware-live-migration Change-Id: If649218db86a04db744990ec0139b4f0b1e79ad6
This commit is contained in:
parent
b335d0c157
commit
083bafc353
@ -1,10 +1,12 @@
|
|||||||
.. important::
|
.. important::
|
||||||
|
|
||||||
Unless :oslo.config:option:`specifically enabled
|
In deployments older than Train, or in mixed Stein/Train deployments with a
|
||||||
<workarounds.enable_numa_live_migration>`, live migration is not currently
|
rolling upgrade in progress, unless :oslo.config:option:`specifically
|
||||||
possible for instances with a NUMA topology when using the libvirt driver.
|
enabled <workarounds.enable_numa_live_migration>`, live migration is not
|
||||||
A NUMA topology may be specified explicitly or can be added implicitly due
|
possible for instances with a NUMA topology when using the libvirt
|
||||||
to the use of CPU pinning or huge pages. Refer to `bug #1289064`__ for more
|
driver. A NUMA topology may be specified explicitly or can be added
|
||||||
information.
|
implicitly due to the use of CPU pinning or huge pages. Refer to `bug
|
||||||
|
#1289064`__ for more information. As of Train, live migration of instances
|
||||||
|
with a NUMA topology when using the libvirt driver is fully supported.
|
||||||
|
|
||||||
__ https://bugs.launchpad.net/nova/+bug/1289064
|
__ https://bugs.launchpad.net/nova/+bug/1289064
|
||||||
|
@ -175,7 +175,9 @@ class LiveMigrationTask(base.TaskBase):
|
|||||||
method='live migrate')
|
method='live migrate')
|
||||||
|
|
||||||
def _check_instance_has_no_numa(self):
|
def _check_instance_has_no_numa(self):
|
||||||
"""Prevent live migrations of instances with NUMA topologies."""
|
"""Prevent live migrations of instances with NUMA topologies.
|
||||||
|
TODO(artom) Remove this check in compute RPC 6.0.
|
||||||
|
"""
|
||||||
if not self.instance.numa_topology:
|
if not self.instance.numa_topology:
|
||||||
return
|
return
|
||||||
|
|
||||||
@ -189,17 +191,32 @@ class LiveMigrationTask(base.TaskBase):
|
|||||||
if hypervisor_type.lower() != obj_fields.HVType.QEMU:
|
if hypervisor_type.lower() != obj_fields.HVType.QEMU:
|
||||||
return
|
return
|
||||||
|
|
||||||
msg = ('Instance has an associated NUMA topology. '
|
# We're fully upgraded to a version that supports NUMA live
|
||||||
'Instance NUMA topologies, including related attributes '
|
# migration, carry on.
|
||||||
'such as CPU pinning, huge page and emulator thread '
|
if objects.Service.get_minimum_version(
|
||||||
'pinning information, are not currently recalculated on '
|
self.context, 'nova-compute') >= 40:
|
||||||
'live migration. See bug #1289064 for more information.'
|
return
|
||||||
)
|
|
||||||
|
|
||||||
if CONF.workarounds.enable_numa_live_migration:
|
if CONF.workarounds.enable_numa_live_migration:
|
||||||
LOG.warning(msg, instance=self.instance)
|
LOG.warning(
|
||||||
|
'Instance has an associated NUMA topology, cell contains '
|
||||||
|
'compute nodes older than train, but the '
|
||||||
|
'enable_numa_live_migration workaround is enabled. Live '
|
||||||
|
'migration will not be NUMA-aware. The instance NUMA '
|
||||||
|
'topology, including related attributes such as CPU pinning, '
|
||||||
|
'huge page and emulator thread pinning information, will not '
|
||||||
|
'be recalculated. See bug #1289064 for more information.',
|
||||||
|
instance=self.instance)
|
||||||
else:
|
else:
|
||||||
raise exception.MigrationPreCheckError(reason=msg)
|
raise exception.MigrationPreCheckError(
|
||||||
|
reason='Instance has an associated NUMA topology, cell '
|
||||||
|
'contains compute nodes older than train, and the '
|
||||||
|
'enable_numa_live_migration workaround is disabled. '
|
||||||
|
'Refusing to perform the live migration, as the '
|
||||||
|
'instance NUMA topology, including related attributes '
|
||||||
|
'such as CPU pinning, huge page and emulator thread '
|
||||||
|
'pinning information, cannot be recalculated. See '
|
||||||
|
'bug #1289064 for more information.')
|
||||||
|
|
||||||
def _check_can_migrate_pci(self, src_host, dest_host):
|
def _check_can_migrate_pci(self, src_host, dest_host):
|
||||||
"""Checks that an instance can migrate with PCI requests.
|
"""Checks that an instance can migrate with PCI requests.
|
||||||
|
@ -157,14 +157,25 @@ Related options:
|
|||||||
cfg.BoolOpt(
|
cfg.BoolOpt(
|
||||||
'enable_numa_live_migration',
|
'enable_numa_live_migration',
|
||||||
default=False,
|
default=False,
|
||||||
|
deprecated_for_removal=True,
|
||||||
|
deprecated_since='20.0.0',
|
||||||
|
deprecated_reason="""This option was added to mitigate known issues
|
||||||
|
when live migrating instances with a NUMA topology with the libvirt driver.
|
||||||
|
Those issues are resolved in Train. Clouds using the libvirt driver and fully
|
||||||
|
upgraded to Train support NUMA-aware live migration. This option will be
|
||||||
|
removed in a future release.
|
||||||
|
""",
|
||||||
help="""
|
help="""
|
||||||
Enable live migration of instances with NUMA topologies.
|
Enable live migration of instances with NUMA topologies.
|
||||||
|
|
||||||
Live migration of instances with NUMA topologies is disabled by default
|
Live migration of instances with NUMA topologies when using the libvirt driver
|
||||||
when using the libvirt driver. This includes live migration of instances with
|
is only supported in deployments that have been fully upgraded to Train. In
|
||||||
CPU pinning or hugepages. CPU pinning and huge page information for such
|
previous versions, or in mixed Stein/Train deployments with a rolling upgrade
|
||||||
instances is not currently re-calculated, as noted in `bug #1289064`_. This
|
in progress, live migration of instances with NUMA topologies is disabled by
|
||||||
means that if instances were already present on the destination host, the
|
default when using the libvirt driver. This includes live migration of
|
||||||
|
instances with CPU pinning or hugepages. CPU pinning and huge page information
|
||||||
|
for such instances is not currently re-calculated, as noted in `bug #1289064`_.
|
||||||
|
This means that if instances were already present on the destination host, the
|
||||||
migrated instance could be placed on the same dedicated cores as these
|
migrated instance could be placed on the same dedicated cores as these
|
||||||
instances or use hugepages allocated for another instance. Alternately, if the
|
instances or use hugepages allocated for another instance. Alternately, if the
|
||||||
host platforms were not homogeneous, the instance could be assigned to
|
host platforms were not homogeneous, the instance could be assigned to
|
||||||
|
@ -187,7 +187,7 @@ class LiveMigrationTaskTestCase(test.NoDBTestCase):
|
|||||||
self.flags(enable_numa_live_migration=False, group='workarounds')
|
self.flags(enable_numa_live_migration=False, group='workarounds')
|
||||||
self.task.instance.numa_topology = None
|
self.task.instance.numa_topology = None
|
||||||
mock_get.return_value = objects.ComputeNode(
|
mock_get.return_value = objects.ComputeNode(
|
||||||
uuid=uuids.cn1, hypervisor_type='kvm')
|
uuid=uuids.cn1, hypervisor_type='qemu')
|
||||||
self.task._check_instance_has_no_numa()
|
self.task._check_instance_has_no_numa()
|
||||||
|
|
||||||
@mock.patch.object(objects.ComputeNode, 'get_by_host_and_nodename')
|
@mock.patch.object(objects.ComputeNode, 'get_by_host_and_nodename')
|
||||||
@ -201,25 +201,47 @@ class LiveMigrationTaskTestCase(test.NoDBTestCase):
|
|||||||
self.task._check_instance_has_no_numa()
|
self.task._check_instance_has_no_numa()
|
||||||
|
|
||||||
@mock.patch.object(objects.ComputeNode, 'get_by_host_and_nodename')
|
@mock.patch.object(objects.ComputeNode, 'get_by_host_and_nodename')
|
||||||
def test_check_instance_has_no_numa_passes_workaround(self, mock_get):
|
@mock.patch.object(objects.Service, 'get_minimum_version',
|
||||||
|
return_value=39)
|
||||||
|
def test_check_instance_has_no_numa_passes_workaround(
|
||||||
|
self, mock_get_min_ver, mock_get):
|
||||||
self.flags(enable_numa_live_migration=True, group='workarounds')
|
self.flags(enable_numa_live_migration=True, group='workarounds')
|
||||||
self.task.instance.numa_topology = objects.InstanceNUMATopology(
|
self.task.instance.numa_topology = objects.InstanceNUMATopology(
|
||||||
cells=[objects.InstanceNUMACell(id=0, cpuset=set([0]),
|
cells=[objects.InstanceNUMACell(id=0, cpuset=set([0]),
|
||||||
memory=1024)])
|
memory=1024)])
|
||||||
mock_get.return_value = objects.ComputeNode(
|
mock_get.return_value = objects.ComputeNode(
|
||||||
uuid=uuids.cn1, hypervisor_type='kvm')
|
uuid=uuids.cn1, hypervisor_type='qemu')
|
||||||
self.task._check_instance_has_no_numa()
|
self.task._check_instance_has_no_numa()
|
||||||
|
mock_get_min_ver.assert_called_once_with(self.context, 'nova-compute')
|
||||||
|
|
||||||
@mock.patch.object(objects.ComputeNode, 'get_by_host_and_nodename')
|
@mock.patch.object(objects.ComputeNode, 'get_by_host_and_nodename')
|
||||||
def test_check_instance_has_no_numa_fails(self, mock_get):
|
@mock.patch.object(objects.Service, 'get_minimum_version',
|
||||||
|
return_value=39)
|
||||||
|
def test_check_instance_has_no_numa_fails(self, mock_get_min_ver,
|
||||||
|
mock_get):
|
||||||
self.flags(enable_numa_live_migration=False, group='workarounds')
|
self.flags(enable_numa_live_migration=False, group='workarounds')
|
||||||
mock_get.return_value = objects.ComputeNode(
|
mock_get.return_value = objects.ComputeNode(
|
||||||
uuid=uuids.cn1, hypervisor_type='QEMU')
|
uuid=uuids.cn1, hypervisor_type='qemu')
|
||||||
self.task.instance.numa_topology = objects.InstanceNUMATopology(
|
self.task.instance.numa_topology = objects.InstanceNUMATopology(
|
||||||
cells=[objects.InstanceNUMACell(id=0, cpuset=set([0]),
|
cells=[objects.InstanceNUMACell(id=0, cpuset=set([0]),
|
||||||
memory=1024)])
|
memory=1024)])
|
||||||
self.assertRaises(exception.MigrationPreCheckError,
|
self.assertRaises(exception.MigrationPreCheckError,
|
||||||
self.task._check_instance_has_no_numa)
|
self.task._check_instance_has_no_numa)
|
||||||
|
mock_get_min_ver.assert_called_once_with(self.context, 'nova-compute')
|
||||||
|
|
||||||
|
@mock.patch.object(objects.ComputeNode, 'get_by_host_and_nodename')
|
||||||
|
@mock.patch.object(objects.Service, 'get_minimum_version',
|
||||||
|
return_value=40)
|
||||||
|
def test_check_instance_has_no_numa_new_svc_passes(self, mock_get_min_ver,
|
||||||
|
mock_get):
|
||||||
|
self.flags(enable_numa_live_migration=False, group='workarounds')
|
||||||
|
mock_get.return_value = objects.ComputeNode(
|
||||||
|
uuid=uuids.cn1, hypervisor_type='qemu')
|
||||||
|
self.task.instance.numa_topology = objects.InstanceNUMATopology(
|
||||||
|
cells=[objects.InstanceNUMACell(id=0, cpuset=set([0]),
|
||||||
|
memory=1024)])
|
||||||
|
self.task._check_instance_has_no_numa()
|
||||||
|
mock_get_min_ver.assert_called_once_with(self.context, 'nova-compute')
|
||||||
|
|
||||||
@mock.patch.object(objects.Service, 'get_by_compute_host')
|
@mock.patch.object(objects.Service, 'get_by_compute_host')
|
||||||
@mock.patch.object(servicegroup.API, 'service_is_up')
|
@mock.patch.object(servicegroup.API, 'service_is_up')
|
||||||
|
@ -0,0 +1,46 @@
|
|||||||
|
---
|
||||||
|
features:
|
||||||
|
- |
|
||||||
|
With the libvirt driver, live migration now works correctly for instances
|
||||||
|
that have a NUMA topology. Previously, the instance was naively moved to
|
||||||
|
the destination host, without updating any of the underlying NUMA guest to
|
||||||
|
host mappings or the resource usage. With the new NUMA-aware live migration
|
||||||
|
feature, if the instance cannot fit on the destination the live migration
|
||||||
|
will be attempted on an alternate destination if the request is
|
||||||
|
setup to have alternates. If the instance can fit on the destination, the
|
||||||
|
NUMA guest to host mappings will be re-calculated to reflect its new
|
||||||
|
host, and its resource usage updated.
|
||||||
|
upgrade:
|
||||||
|
- |
|
||||||
|
For the libvirt driver, the NUMA-aware live migration feature requires the
|
||||||
|
conductor, source compute, and destination compute to be upgraded to Train.
|
||||||
|
It also requires the conductor and source compute to be able to send RPC
|
||||||
|
5.3 - that is, their ``[upgrade_levels]/compute`` configuration option must
|
||||||
|
not be set to less than 5.3 or a release older than "train".
|
||||||
|
|
||||||
|
In other words, NUMA-aware live migration with the libvirt driver is not
|
||||||
|
supported until:
|
||||||
|
|
||||||
|
* All compute and conductor services are upgraded to Train code.
|
||||||
|
* The ``[upgrade_levels]/compute`` RPC API pin is removed (or set to
|
||||||
|
"auto") and services are restarted.
|
||||||
|
|
||||||
|
If any of these requirements are not met, live migration of instances with
|
||||||
|
a NUMA topology with the libvirt driver will revert to the legacy naive
|
||||||
|
behavior, in which the instance was simply moved over without updating its
|
||||||
|
NUMA guest to host mappings or its resource usage.
|
||||||
|
|
||||||
|
.. note:: The legacy naive behavior is dependent on the value of the
|
||||||
|
``[workarounds]/enable_numa_live_migration`` option. Refer to the
|
||||||
|
Deprecations sections for more details.
|
||||||
|
deprecations:
|
||||||
|
- |
|
||||||
|
With the introduction of the NUMA-aware live migration feature for the
|
||||||
|
libvirt driver, ``[workarounds]/enable_numa_live_migration`` is
|
||||||
|
deprecated. Once a cell has been fully upgraded to Train, its value is
|
||||||
|
ignored.
|
||||||
|
|
||||||
|
.. note:: Even in a cell fully upgraded to Train, RPC pinning via
|
||||||
|
``[upgrade_levels]/compute`` can cause live migration of
|
||||||
|
instances with a NUMA topology to revert to the legacy naive
|
||||||
|
behavior. For more details refer to the Upgrade section.
|
Loading…
x
Reference in New Issue
Block a user