
There's also a PCI passthrough guide. Use that instead, allowing us to remove the sections for various extra specs from the 'user/flavors' guide: - hw:pci_numa_affinity_policy - pci_passthrough:alias Change-Id: I5701d284c2cfdadf825f8e2f699651b3f8c0c9ab Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
11 KiB
Attaching physical PCI devices to guests
The PCI passthrough feature in OpenStack allows full access and direct control of a physical PCI device in guests. This mechanism is generic for any kind of PCI device, and runs with a Network Interface Card (NIC), Graphics Processing Unit (GPU), or any other devices that can be attached to a PCI bus. Correct driver installation is the only requirement for the guest to properly use the devices.
Some PCI devices provide Single Root I/O Virtualization and Sharing (SR-IOV) capabilities. When SR-IOV is used, a physical device is virtualized and appears as multiple PCI devices. Virtual PCI devices are assigned to the same or different guests. In the case of PCI passthrough, the full physical device is assigned to only one guest and cannot be shared.
PCI devices are requested through flavor extra specs, specifically
via the :novapci_passthrough:alias
flavor extra spec. This
guide demonstrates how to enable PCI passthrough for a type of PCI
device with a vendor ID of 8086
and a product ID of
154d
- an Intel X520 Network Adapter - by mapping them to
the alias a1
. You should adjust the instructions for other
devices with potentially different capabilities.
Note
For information on creating servers with SR-IOV network interfaces,
refer to the Networking Guide <admin/config-sriov>
.
Limitations
- Attaching SR-IOV ports to existing servers was not supported until the 22.0.0 Victoria release. Due to various bugs in libvirt and qemu we recommend to use at least libvirt version 6.0.0 and at least qemu version 4.2.
- Cold migration (resize) of servers with SR-IOV devices attached was not supported until the 14.0.0 Newton release, see bug 1512800 for details.
Note
Nova only supports PCI addresses where the fields are restricted to the following maximum value:
- domain - 0xFFFF
- bus - 0xFF
- slot - 0x1F
- function - 0x7
Nova will ignore PCI devices reported by the hypervisor if the address is outside of these ranges.
Enabling PCI passthrough
Configure compute host
To enable PCI passthrough on an x86, Linux-based compute node, the following are required:
- VT-d enabled in the BIOS
- IOMMU enabled on the host OS, e.g. by adding the
intel_iommu=on
oramd_iommu=on
parameter to the kernel parameters - Assignable PCIe devices
To enable PCI passthrough on a Hyper-V compute node, the following are required:
- Windows 10 or Windows / Hyper-V Server 2016 or newer
- VT-d enabled on the host
- Assignable PCI devices
In order to check the requirements above and if there are any assignable PCI devices, run the following Powershell commands:
Start-BitsTransfer https://raw.githubusercontent.com/Microsoft/Virtualization-Documentation/master/hyperv-samples/benarm-powershell/DDA/survey-dda.ps1
.\survey-dda.ps1
If the compute node passes all the requirements, the desired assignable PCI devices to be disabled and unmounted from the host, in order to be assignable by Hyper-V. The following can be read for more details: Hyper-V PCI passthrough.
Configure nova-compute
Once PCI passthrough has been configured for the host, nova-compute
must be
configured to allow the PCI device to pass through to VMs. This is done
using the :oslo.configpci.passthrough_whitelist
option. For example,
assuming our sample PCI device has a PCI address of 41:00.0
on each host:
[pci]
passthrough_whitelist = { "address": "0000:41:00.0" }
Refer to :oslo.configpci.passthrough_whitelist
for syntax
information.
Alternatively, to enable passthrough of all devices with the same product and vendor ID:
[pci]
passthrough_whitelist = { "vendor_id": "8086", "product_id": "154d" }
If using vendor and product IDs, all PCI devices matching the
vendor_id
and product_id
are added to the pool
of PCI devices available for passthrough to VMs.
In addition, it is necessary to configure the :oslo.configpci.alias
option, which is
a JSON-style configuration option that allows you to map a given device
type, identified by the standard PCI vendor_id
and
(optional) product_id
fields, to an arbitrary name or
alias. This alias can then be used to request a PCI device
using the :novapci_passthrough:alias
flavor extra spec, as
discussed previously. For our sample device with a vendor ID of
0x8086
and a product ID of 0x154d
, this would
be:
[pci]
alias = { "vendor_id":"8086", "product_id":"154d", "device_type":"type-PF", "name":"a1" }
It's important to note the addition of the device_type
field. This is necessary because this PCI device supports SR-IOV. The
nova-compute
service categorizes devices into one of three
types, depending on the capabilities the devices report:
type-PF
-
The device supports SR-IOV and is the parent or root device.
type-VF
-
The device is a child device of a device that supports SR-IOV.
type-PCI
-
The device does not support SR-IOV.
By default, it is only possible to attach type-PCI
devices using PCI passthrough. If you wish to attach
type-PF
or type-VF
devices, you must specify
the device_type
field in the config option. If the device
was a device that did not support SR-IOV, the device_type
field could be omitted.
Refer to :oslo.configpci.alias
for syntax information.
Important
This option must also be configured on controller nodes. This is discussed later in this document.
Once configured, restart the nova-compute
service.
Configure
nova-scheduler
The nova-scheduler
service must be configured to
enable the PciPassthroughFilter
. To do this, add this
filter to the list of filters specified in :oslo.configfilter_scheduler.enabled_filters
and set
:oslo.configfilter_scheduler.available_filters
to the default
of nova.scheduler.filters.all_filters
. For example:
[filter_scheduler]
enabled_filters = ...,PciPassthroughFilter
available_filters = nova.scheduler.filters.all_filters
Once done, restart the nova-scheduler
service.
Configure nova-api
It is necessary to also configure the :oslo.configpci.alias
config option on
the controller. This configuration should match the configuration found
on the compute nodes. For example:
[pci]
alias = { "vendor_id":"8086", "product_id":"154d", "device_type":"type-PF", "name":"a1", "numa_policy":"preferred" }
Refer to :oslo.configpci.alias
for syntax information. Refer to Affinity <pci-numa-affinity-policy>
for
numa_policy
information.
Once configured, restart the nova-api
service.
Configuring a flavor or image
Once the alias has been configured, it can be used for an flavor
extra spec. For example, to request two of the PCI devices referenced by
alias a1
, run:
$ openstack flavor set m1.large --property "pci_passthrough:alias"="a1:2"
For more information about the syntax for
pci_passthrough:alias
, refer to the documentation </configuration/extra-specs>
.
PCI-NUMA affinity policies
By default, the libvirt driver enforces strict NUMA affinity for PCI
devices, be they PCI passthrough devices or neutron SR-IOV interfaces.
This means that by default a PCI device must be allocated from the same
host NUMA node as at least one of the instance's CPUs. This isn't always
necessary, however, and you can configure this policy using the
:novahw:pci_numa_affinity_policy
flavor extra spec
or equivalent image metadata property. There are three possible values
allowed:
- required
-
This policy means that nova will boot instances with PCI devices only if at least one of the NUMA nodes of the instance is associated with these PCI devices. It means that if NUMA node info for some PCI devices could not be determined, those PCI devices wouldn't be consumable by the instance. This provides maximum performance.
- socket
-
This policy means that the PCI device must be affined to the same host socket as at least one of the guest NUMA nodes. For example, consider a system with two sockets, each with two NUMA nodes, numbered node 0 and node 1 on socket 0, and node 2 and node 3 on socket 1. There is a PCI device affined to node 0. An PCI instance with two guest NUMA nodes and the
socket
policy can be affined to either:- node 0 and node 1
- node 0 and node 2
- node 0 and node 3
- node 1 and node 2
- node 1 and node 3
The instance cannot be affined to node 2 and node 3, as neither of those are on the same socket as the PCI device. If the other nodes are consumed by other instances and only nodes 2 and 3 are available, the instance will not boot.
- preferred
-
This policy means that
nova-scheduler
will choose a compute host with minimal consideration for the NUMA affinity of PCI devices.nova-compute
will attempt a best effort selection of PCI devices based on NUMA affinity, however, if this is not possible thennova-compute
will fall back to scheduling on a NUMA node that is not associated with the PCI device. - legacy
-
This is the default policy and it describes the current nova behavior. Usually we have information about association of PCI devices with NUMA nodes. However, some PCI devices do not provide such information. The
legacy
value will mean that nova will boot instances with PCI device if either:- The PCI device is associated with at least one NUMA nodes on which the instance will be booted
- There is no information about PCI-NUMA affinity available
For example, to configure a flavor to use the preferred
PCI NUMA affinity policy for any neutron SR-IOV interfaces attached by
the user:
$ openstack flavor set $FLAVOR \
--property hw:pci_numa_affinity_policy=preferred
You can also configure this for PCI passthrough devices by specifying
the policy in the alias configuration via :oslo.configpci.alias
. For more
information, refer to :oslo.configthe documentation <pci.alias>
.