Update driver to map the targeted address for SR-IOV PCI devices

This patch checks the revision of QEMU and libvirt to ensure support
for VFIO SR-IOV device migration.
It also updates the _live_migration_operation() function, particularly
the get_updated_guest_xml() function, to map source PCI addresses
to destination addresses in the destination XML file, using the data
provided by the LiveMigrateData object.

The target goal of these series of patch is to enable VFIO devices
migration with kernel variant drivers.

Partially-Implements: blueprint migrate-vfio-devices-using-kernel-variant-drivers
Change-Id: I62ec475988eab8de948498f50d8d4c0d47321102
This commit is contained in:
René Ribaud 2025-02-18 21:51:09 +01:00
parent b227efd967
commit fd656f3943
6 changed files with 1407 additions and 7 deletions

View File

@ -36,7 +36,24 @@ to use move operations, for each ``nova-compute`` service.
Possible Values:
* A dictionary of JSON values which describe the aliases. For example::
* A JSON dictionary which describe a PCI device. It should take
the following format::
alias = {
"name": "<name>",
["product_id": "<id>"],
["vendor_id": "<id>"],
"device_type": "<type>",
["numa_policy": "<policy>"],
["resource_class": "<resource_class>"],
["traits": "<traits>"]
["live_migratable": "<live_migratable>"],
}
Where ``[`` indicates zero or one occurrences, ``{`` indicates zero or
multiple occurrences, and ``|`` mutually exclusive options.
For example::
alias = {
"name": "QuickAssist",
@ -46,8 +63,17 @@ Possible Values:
"numa_policy": "required"
}
This defines an alias for the Intel QuickAssist card. (multi valued). Valid
key values are :
This defines an alias for the Intel QuickAssist card. (multi valued).
Another example::
alias = {
"name": "A16_16A",
"device_type": "type-VF",
resource_class: "CUSTOM_A16_16A",
}
Valid key values are :
``name``
Name of the PCI alias.
@ -97,6 +123,22 @@ Possible Values:
scheduling the request. This field can only be used only if
``[filter_scheduler]pci_in_placement`` is enabled.
``live_migratable``
Specify if live-migratable devices are desired.
May have boolean-like string values case-insensitive values:
"yes" or "no".
- ``live_migratable='yes'`` means that the user wants a device(s)
allowing live migration to a similar device(s) on another host.
- ``live_migratable='no'`` This explicitly indicates that the user
requires a non-live migratable device, making migration impossible.
- If not specified, the default is ``live_migratable=None``, meaning that
either a live migratable or non-live migratable device will be picked
automatically. However, in such cases, migration will **not** be
possible.
* Supports multiple aliases by repeating the option (not by specifying
a list value)::
@ -112,7 +154,8 @@ Possible Values:
"product_id": "0444",
"vendor_id": "8086",
"device_type": "type-PCI",
"numa_policy": "required"
"numa_policy": "required",
"live_migratable": "yes",
}
"""),
cfg.MultiStrOpt('device_spec',
@ -165,7 +208,9 @@ Possible values:
Supported ``<tag>`` values are :
- ``physical_network``
- ``trusted``
- ``remote_managed`` - a VF is managed remotely by an off-path networking
backend. May have boolean-like string values case-insensitive values:
"true" or "false". By default, "false" is assumed for all devices.
@ -174,6 +219,7 @@ Possible values:
VPD capability with a card serial number (either on a VF itself on
its corresponding PF), otherwise they will be ignored and not
available for allocation.
- ``managed`` - Specify if the PCI device is managed by libvirt.
May have boolean-like string values case-insensitive values:
"yes" or "no". By default, "yes" is assumed for all devices.
@ -189,6 +235,18 @@ Possible values:
Warning: Incorrect configuration of this parameter may result in compute
node crashes.
- ``live_migratable`` - Specify if the PCI device is live_migratable by
libvirt.
May have boolean-like string values case-insensitive values:
"yes" or "no". By default, "no" is assumed for all devices.
- ``live_migratable='yes'`` means that the device can be live migrated.
Of course, this requires hardware support, as well as proper system
and hypervisor configuration.
- ``live_migratable='no'`` means that the device cannot be live migrated.
- ``resource_class`` - optional Placement resource class name to be used
to track the matching PCI devices in Placement when
[pci]report_in_placement is True.
@ -202,6 +260,7 @@ Possible values:
device's ``vendor_id`` and ``product_id`` in the form of
``CUSTOM_PCI_{vendor_id}_{product_id}``.
The ``resource_class`` can be requested from a ``[pci]alias``
- ``traits`` - optional comma separated list of Placement trait names to
report on the resource provider that will represent the matching PCI
device. Each trait can be a standard trait from ``os-traits`` lib or can

File diff suppressed because it is too large Load Diff

View File

@ -35,6 +35,14 @@ from nova.virt.libvirt import host
from nova.virt.libvirt import migration
def _normalize(xml_str):
return etree.tostring(
etree.fromstring(xml_str),
pretty_print=True,
encoding="unicode",
).strip()
class UtilityMigrationTestCase(test.NoDBTestCase):
def test_graphics_listen_addrs(self):
@ -278,6 +286,193 @@ class UtilityMigrationTestCase(test.NoDBTestCase):
self.assertRaises(exception.NovaException,
migration._update_mdev_xml, doc, data.target_mdevs)
def test_update_pci_dev_xml(self):
xml_pattern = """<domain>
<devices>
<hostdev mode='subsystem' type='pci' managed='no'>
<driver name='vfio'/>
<source>
<address domain='0x0000' bus='0x25' slot='0x00' function='0x4'/>
</source>
<alias name='hostdev0'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x05' function='0x0'/>
</hostdev>
</devices>
</domain>"""
expected_xml_pattern = """<domain>
<devices>
<hostdev mode='subsystem' type='pci' managed='no'>
<driver name='vfio'/>
<source>
<address domain='0x0000' bus='0x26' slot='0x01' function='0x5'/>
</source>
<alias name='hostdev0'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x05' function='0x0'/>
</hostdev>
</devices>
</domain>"""
data = objects.LibvirtLiveMigrateData(
pci_dev_map_src_dst={"0000:25:00.4": "0000:26:01.5"})
doc = etree.fromstring(xml_pattern)
res = migration._update_pci_dev_xml(doc, data.pci_dev_map_src_dst)
self.assertEqual(
_normalize(expected_xml_pattern),
etree.tostring(res, encoding="unicode", pretty_print=True).strip(),
)
def test_update_pci_dev_xml_with_2_hostdevs(self):
xml_pattern = """<domain>
<devices>
<hostdev mode='subsystem' type='pci' managed='no'>
<driver name='vfio'/>
<source>
<address domain='0x0000' bus='0x25' slot='0x00' function='0x4'/>
</source>
<alias name='hostdev0'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x05' function='0x0'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='no'>
<driver name='vfio'/>
<source>
<address domain='0x0000' bus='0x25' slot='0x01' function='0x4'/>
</source>
<alias name='hostdev0'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x06' function='0x0'/>
</hostdev>
</devices>
</domain>"""
expected_xml_pattern = """<domain>
<devices>
<hostdev mode='subsystem' type='pci' managed='no'>
<driver name='vfio'/>
<source>
<address domain='0x0000' bus='0x26' slot='0x01' function='0x5'/>
</source>
<alias name='hostdev0'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x05' function='0x0'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='no'>
<driver name='vfio'/>
<source>
<address domain='0x0000' bus='0x26' slot='0x01' function='0x4'/>
</source>
<alias name='hostdev0'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x06' function='0x0'/>
</hostdev>
</devices>
</domain>"""
data = objects.LibvirtLiveMigrateData(
pci_dev_map_src_dst={
"0000:25:00.4": "0000:26:01.5",
"0000:25:01.4": "0000:26:01.4",
}
)
doc = etree.fromstring(xml_pattern)
res = migration._update_pci_dev_xml(doc, data.pci_dev_map_src_dst)
self.assertEqual(
_normalize(expected_xml_pattern),
etree.tostring(res, encoding="unicode", pretty_print=True).strip(),
)
def test_update_pci_dev_xml_with_2_hostdevs_second_one_not_in_map(self):
xml_pattern = """<domain>
<devices>
<hostdev mode='subsystem' type='pci' managed='no'>
<driver name='vfio'/>
<source>
<address domain='0x0000' bus='0x25' slot='0x00' function='0x4'/>
</source>
<alias name='hostdev0'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x05' function='0x0'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='no'>
<driver name='vfio'/>
<source>
<address domain='0x0000' bus='0x25' slot='0x01' function='0x4'/>
</source>
<alias name='hostdev0'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x06' function='0x0'/>
</hostdev>
</devices>
</domain>"""
expected_xml_pattern = """<domain>
<devices>
<hostdev mode='subsystem' type='pci' managed='no'>
<driver name='vfio'/>
<source>
<address domain='0x0000' bus='0x26' slot='0x01' function='0x5'/>
</source>
<alias name='hostdev0'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x05' function='0x0'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='no'>
<driver name='vfio'/>
<source>
<address domain='0x0000' bus='0x25' slot='0x01' function='0x4'/>
</source>
<alias name='hostdev0'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x06' function='0x0'/>
</hostdev>
</devices>
</domain>"""
data = objects.LibvirtLiveMigrateData(
pci_dev_map_src_dst={
"0000:25:00.4": "0000:26:01.5",
}
)
doc = etree.fromstring(xml_pattern)
res = migration._update_pci_dev_xml(doc, data.pci_dev_map_src_dst)
self.assertEqual(
_normalize(expected_xml_pattern),
etree.tostring(res, encoding="unicode", pretty_print=True).strip(),
)
def test_update_pci_dev_xml_fails_not_found_src_address(self):
xml_pattern = """<domain>
<devices>
<hostdev mode='subsystem' type='pci' managed='no'>
<driver name='vfio'/>
<source>
<address domain='0x0000' bus='0x25' slot='0x00' function='0x4'/>
</source>
<alias name='hostdev0'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x05' function='0x0'/>
</hostdev>
</devices>
</domain>"""
data = objects.LibvirtLiveMigrateData(
pci_dev_map_src_dst={"0000:25:00.5": "0000:26:01.5"})
doc = etree.fromstring(xml_pattern)
exc = self.assertRaises(
exception.NovaException,
migration._update_pci_dev_xml,
doc,
data.pci_dev_map_src_dst,
)
norm = _normalize(xml_pattern)
self.assertIn(
'Unable to find the hostdev '
f'to replace for this source PCI address: 0000:25:00.5 '
f'in the xml: {norm}',
str(exc),
)
def test_update_cpu_shared_set_xml(self):
doc = etree.fromstring("""
<domain>

View File

@ -97,6 +97,7 @@ from nova.objects import diagnostics as diagnostics_obj
from nova.objects import fields
from nova.objects import migrate_data as migrate_data_obj
from nova.pci import utils as pci_utils
from nova.pci import whitelist
import nova.privsep.libvirt
import nova.privsep.path
import nova.privsep.utils
@ -266,6 +267,10 @@ MIN_LIBVIRT_STATELESS_FIRMWARE = (8, 6, 0)
MIN_IGB_LIBVIRT_VERSION = (9, 3, 0)
MIN_IGB_QEMU_VERSION = (8, 0, 0)
# Minimum versions supporting vfio-pci variant driver.
MIN_VFIO_PCI_VARIANT_LIBVIRT_VERSION = (10, 0, 0)
MIN_VFIO_PCI_VARIANT_QEMU_VERSION = (8, 2, 2)
REGISTER_IMAGE_PROPERTY_DEFAULTS = [
'hw_machine_type',
'hw_cdrom_bus',
@ -902,10 +907,35 @@ class LibvirtDriver(driver.ComputeDriver):
self._check_multipath()
# Even if we already checked the whitelist at startup, this driver
# needs to check specific hypervisor versions
self._check_pci_whitelist()
# Set REGISTER_IMAGE_PROPERTY_DEFAULTS in the instance system_metadata
# to default values for properties that have not already been set.
self._register_all_undefined_instance_details()
def _check_pci_whitelist(self):
need_specific_version = False
if CONF.pci.device_spec:
pci_whitelist = whitelist.Whitelist(CONF.pci.device_spec)
for spec in pci_whitelist.specs:
if spec.tags.get("live_migratable"):
need_specific_version = True
if need_specific_version and not self._host.has_min_version(
lv_ver=MIN_VFIO_PCI_VARIANT_LIBVIRT_VERSION,
hv_ver=MIN_VFIO_PCI_VARIANT_QEMU_VERSION,
hv_type=host.HV_DRIVER_QEMU,
):
msg = _(
"PCI device spec is configured for "
"live_migratable but it's not supported by libvirt."
)
raise exception.InvalidConfiguration(msg)
def _update_host_specific_capabilities(self) -> None:
"""Update driver capabilities based on capabilities of the host."""
# TODO(stephenfin): We should also be reporting e.g. SEV functionality

View File

@ -16,7 +16,6 @@
"""Utility methods to manage guests migration
"""
from collections import deque
from lxml import etree
@ -88,6 +87,11 @@ def get_updated_guest_xml(instance, guest, migrate_data, get_volume_config,
xml_doc = _update_numa_xml(xml_doc, migrate_data)
if 'target_mdevs' in migrate_data:
xml_doc = _update_mdev_xml(xml_doc, migrate_data.target_mdevs)
if "pci_dev_map_src_dst" in migrate_data:
xml_doc = _update_pci_dev_xml(
xml_doc, migrate_data.pci_dev_map_src_dst
)
if new_resources:
xml_doc = _update_device_resources_xml(xml_doc, new_resources)
return etree.tostring(xml_doc, encoding='unicode')
@ -149,6 +153,77 @@ def _update_mdev_xml(xml_doc, target_mdevs):
return xml_doc
def _update_pci_dev_xml(xml_doc, pci_dev_map_src_dst):
hostdevs = xml_doc.findall('./devices/hostdev')
for src_addr, dst_addr in pci_dev_map_src_dst.items():
src_fields = _get_pci_address_fields_with_prefix(src_addr)
dst_fields = _get_pci_address_fields_with_prefix(dst_addr)
if not _update_hostdev_address(hostdevs, src_fields, dst_fields):
_raise_hostdev_not_found_exception(xml_doc, src_addr)
LOG.debug(
'_update_pci_xml output xml=%s',
etree.tostring(xml_doc, encoding='unicode', pretty_print=True)
)
return xml_doc
def _get_pci_address_fields_with_prefix(addr):
(domain, bus, slot, func) = nova.pci.utils.get_pci_address_fields(addr)
return (f"0x{domain}", f"0x{bus}", f"0x{slot}", f"0x{func}")
def _update_hostdev_address(hostdevs, src_fields, dst_fields):
src_domain, src_bus, src_slot, src_function = src_fields
dst_domain, dst_bus, dst_slot, dst_function = dst_fields
for hostdev in hostdevs:
if hostdev.get('type') != 'pci':
continue
address_tag = hostdev.find('./source/address')
if address_tag is None:
continue
if _address_matches(
address_tag, src_domain, src_bus, src_slot, src_function
):
_set_address_fields(
address_tag, dst_domain, dst_bus, dst_slot, dst_function
)
return True
return False
def _address_matches(address_tag, domain, bus, slot, function):
return (
address_tag.get('domain') == domain and
address_tag.get('bus') == bus and
address_tag.get('slot') == slot and
address_tag.get('function') == function
)
def _set_address_fields(address_tag, domain, bus, slot, function):
address_tag.set('domain', domain)
address_tag.set('bus', bus)
address_tag.set('slot', slot)
address_tag.set('function', function)
def _raise_hostdev_not_found_exception(xml_doc, src_addr):
xml = etree.tostring(
xml_doc, encoding="unicode", pretty_print=True
).strip()
raise exception.NovaException(
'Unable to find the hostdev to replace for this source PCI '
f'address: {src_addr} in the xml: {xml}'
)
def _update_cpu_shared_set_xml(xml_doc, migrate_data):
LOG.debug('_update_cpu_shared_set_xml input xml=%s',
etree.tostring(xml_doc, encoding='unicode', pretty_print=True))

View File

@ -0,0 +1,8 @@
---
features:
- |
This release adds support for migrating SR-IOV devices
using the new kernel VFIO SR-IOV variant driver interface.
See the `OpenStack configuration documentation`__ for more details.
.. __: https://docs.openstack.org/nova/latest/configuration/config.html#pci