
There's also a PCI passthrough guide. Use that instead, allowing us to remove the sections for various extra specs from the 'user/flavors' guide: - hw:pci_numa_affinity_policy - pci_passthrough:alias Change-Id: I5701d284c2cfdadf825f8e2f699651b3f8c0c9ab Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
279 lines
11 KiB
ReStructuredText
279 lines
11 KiB
ReStructuredText
========================================
|
|
Attaching physical PCI devices to guests
|
|
========================================
|
|
|
|
The PCI passthrough feature in OpenStack allows full access and direct control
|
|
of a physical PCI device in guests. This mechanism is generic for any kind of
|
|
PCI device, and runs with a Network Interface Card (NIC), Graphics Processing
|
|
Unit (GPU), or any other devices that can be attached to a PCI bus. Correct
|
|
driver installation is the only requirement for the guest to properly use the
|
|
devices.
|
|
|
|
Some PCI devices provide Single Root I/O Virtualization and Sharing (SR-IOV)
|
|
capabilities. When SR-IOV is used, a physical device is virtualized and appears
|
|
as multiple PCI devices. Virtual PCI devices are assigned to the same or
|
|
different guests. In the case of PCI passthrough, the full physical device is
|
|
assigned to only one guest and cannot be shared.
|
|
|
|
PCI devices are requested through flavor extra specs, specifically via the
|
|
:nova:extra-spec:`pci_passthrough:alias` flavor extra spec.
|
|
This guide demonstrates how to enable PCI passthrough for a type of PCI device
|
|
with a vendor ID of ``8086`` and a product ID of ``154d`` - an Intel X520
|
|
Network Adapter - by mapping them to the alias ``a1``.
|
|
You should adjust the instructions for other devices with potentially different
|
|
capabilities.
|
|
|
|
.. note::
|
|
|
|
For information on creating servers with SR-IOV network interfaces, refer to
|
|
the :neutron-doc:`Networking Guide <admin/config-sriov>`.
|
|
|
|
**Limitations**
|
|
|
|
* Attaching SR-IOV ports to existing servers was not supported until the
|
|
22.0.0 Victoria release. Due to various bugs in libvirt and qemu we
|
|
recommend to use at least libvirt version 6.0.0 and at least qemu version
|
|
4.2.
|
|
* Cold migration (resize) of servers with SR-IOV devices attached was not
|
|
supported until the 14.0.0 Newton release, see
|
|
`bug 1512800 <https://bugs.launchpad.net/nova/+bug/1512880>`_ for details.
|
|
|
|
.. note::
|
|
|
|
Nova only supports PCI addresses where the fields are restricted to the
|
|
following maximum value:
|
|
|
|
* domain - 0xFFFF
|
|
* bus - 0xFF
|
|
* slot - 0x1F
|
|
* function - 0x7
|
|
|
|
Nova will ignore PCI devices reported by the hypervisor if the address is
|
|
outside of these ranges.
|
|
|
|
Enabling PCI passthrough
|
|
------------------------
|
|
|
|
Configure compute host
|
|
~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
To enable PCI passthrough on an x86, Linux-based compute node, the following
|
|
are required:
|
|
|
|
* VT-d enabled in the BIOS
|
|
* IOMMU enabled on the host OS, e.g. by adding the ``intel_iommu=on`` or
|
|
``amd_iommu=on`` parameter to the kernel parameters
|
|
* Assignable PCIe devices
|
|
|
|
To enable PCI passthrough on a Hyper-V compute node, the following are
|
|
required:
|
|
|
|
* Windows 10 or Windows / Hyper-V Server 2016 or newer
|
|
* VT-d enabled on the host
|
|
* Assignable PCI devices
|
|
|
|
In order to check the requirements above and if there are any assignable PCI
|
|
devices, run the following Powershell commands:
|
|
|
|
.. code-block:: console
|
|
|
|
Start-BitsTransfer https://raw.githubusercontent.com/Microsoft/Virtualization-Documentation/master/hyperv-samples/benarm-powershell/DDA/survey-dda.ps1
|
|
.\survey-dda.ps1
|
|
|
|
If the compute node passes all the requirements, the desired assignable PCI
|
|
devices to be disabled and unmounted from the host, in order to be assignable
|
|
by Hyper-V. The following can be read for more details: `Hyper-V PCI
|
|
passthrough`__.
|
|
|
|
.. __: https://devblogs.microsoft.com/scripting/passing-through-devices-to-hyper-v-vms-by-using-discrete-device-assignment/
|
|
|
|
Configure ``nova-compute``
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
Once PCI passthrough has been configured for the host, :program:`nova-compute`
|
|
must be configured to allow the PCI device to pass through to VMs. This is done
|
|
using the :oslo.config:option:`pci.passthrough_whitelist` option. For example,
|
|
assuming our sample PCI device has a PCI address of ``41:00.0`` on each host:
|
|
|
|
.. code-block:: ini
|
|
|
|
[pci]
|
|
passthrough_whitelist = { "address": "0000:41:00.0" }
|
|
|
|
Refer to :oslo.config:option:`pci.passthrough_whitelist` for syntax information.
|
|
|
|
Alternatively, to enable passthrough of all devices with the same product and
|
|
vendor ID:
|
|
|
|
.. code-block:: ini
|
|
|
|
[pci]
|
|
passthrough_whitelist = { "vendor_id": "8086", "product_id": "154d" }
|
|
|
|
If using vendor and product IDs, all PCI devices matching the ``vendor_id`` and
|
|
``product_id`` are added to the pool of PCI devices available for passthrough
|
|
to VMs.
|
|
|
|
In addition, it is necessary to configure the :oslo.config:option:`pci.alias`
|
|
option, which is a JSON-style configuration option that allows you to map a
|
|
given device type, identified by the standard PCI ``vendor_id`` and (optional)
|
|
``product_id`` fields, to an arbitrary name or *alias*. This alias can then be
|
|
used to request a PCI device using the :nova:extra-spec:`pci_passthrough:alias`
|
|
flavor extra spec, as discussed previously.
|
|
For our sample device with a vendor ID of ``0x8086`` and a product ID of
|
|
``0x154d``, this would be:
|
|
|
|
.. code-block:: ini
|
|
|
|
[pci]
|
|
alias = { "vendor_id":"8086", "product_id":"154d", "device_type":"type-PF", "name":"a1" }
|
|
|
|
It's important to note the addition of the ``device_type`` field. This is
|
|
necessary because this PCI device supports SR-IOV. The ``nova-compute`` service
|
|
categorizes devices into one of three types, depending on the capabilities the
|
|
devices report:
|
|
|
|
``type-PF``
|
|
The device supports SR-IOV and is the parent or root device.
|
|
|
|
``type-VF``
|
|
The device is a child device of a device that supports SR-IOV.
|
|
|
|
``type-PCI``
|
|
The device does not support SR-IOV.
|
|
|
|
By default, it is only possible to attach ``type-PCI`` devices using PCI
|
|
passthrough. If you wish to attach ``type-PF`` or ``type-VF`` devices, you must
|
|
specify the ``device_type`` field in the config option. If the device was a
|
|
device that did not support SR-IOV, the ``device_type`` field could be omitted.
|
|
|
|
Refer to :oslo.config:option:`pci.alias` for syntax information.
|
|
|
|
.. important::
|
|
|
|
This option must also be configured on controller nodes. This is discussed later
|
|
in this document.
|
|
|
|
Once configured, restart the :program:`nova-compute` service.
|
|
|
|
Configure ``nova-scheduler``
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The :program:`nova-scheduler` service must be configured to enable the
|
|
``PciPassthroughFilter``. To do this, add this filter to the list of filters
|
|
specified in :oslo.config:option:`filter_scheduler.enabled_filters` and set
|
|
:oslo.config:option:`filter_scheduler.available_filters` to the default of
|
|
``nova.scheduler.filters.all_filters``. For example:
|
|
|
|
.. code-block:: ini
|
|
|
|
[filter_scheduler]
|
|
enabled_filters = ...,PciPassthroughFilter
|
|
available_filters = nova.scheduler.filters.all_filters
|
|
|
|
Once done, restart the :program:`nova-scheduler` service.
|
|
|
|
Configure ``nova-api``
|
|
~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
It is necessary to also configure the :oslo.config:option:`pci.alias` config
|
|
option on the controller. This configuration should match the configuration
|
|
found on the compute nodes. For example:
|
|
|
|
.. code-block:: ini
|
|
|
|
[pci]
|
|
alias = { "vendor_id":"8086", "product_id":"154d", "device_type":"type-PF", "name":"a1", "numa_policy":"preferred" }
|
|
|
|
Refer to :oslo.config:option:`pci.alias` for syntax information.
|
|
Refer to :ref:`Affinity <pci-numa-affinity-policy>` for ``numa_policy``
|
|
information.
|
|
|
|
Once configured, restart the :program:`nova-api` service.
|
|
|
|
|
|
Configuring a flavor or image
|
|
-----------------------------
|
|
|
|
Once the alias has been configured, it can be used for an flavor extra spec.
|
|
For example, to request two of the PCI devices referenced by alias ``a1``, run:
|
|
|
|
.. code-block:: console
|
|
|
|
$ openstack flavor set m1.large --property "pci_passthrough:alias"="a1:2"
|
|
|
|
For more information about the syntax for ``pci_passthrough:alias``, refer to
|
|
:doc:`the documentation </configuration/extra-specs>`.
|
|
|
|
|
|
.. _pci-numa-affinity-policy:
|
|
|
|
PCI-NUMA affinity policies
|
|
--------------------------
|
|
|
|
By default, the libvirt driver enforces strict NUMA affinity for PCI devices,
|
|
be they PCI passthrough devices or neutron SR-IOV interfaces. This means that
|
|
by default a PCI device must be allocated from the same host NUMA node as at
|
|
least one of the instance's CPUs. This isn't always necessary, however, and you
|
|
can configure this policy using the
|
|
:nova:extra-spec:`hw:pci_numa_affinity_policy` flavor extra spec or equivalent
|
|
image metadata property. There are three possible values allowed:
|
|
|
|
**required**
|
|
This policy means that nova will boot instances with PCI devices **only**
|
|
if at least one of the NUMA nodes of the instance is associated with these
|
|
PCI devices. It means that if NUMA node info for some PCI devices could not
|
|
be determined, those PCI devices wouldn't be consumable by the instance.
|
|
This provides maximum performance.
|
|
|
|
**socket**
|
|
This policy means that the PCI device must be affined to the same host
|
|
socket as at least one of the guest NUMA nodes. For example, consider a
|
|
system with two sockets, each with two NUMA nodes, numbered node 0 and node
|
|
1 on socket 0, and node 2 and node 3 on socket 1. There is a PCI device
|
|
affined to node 0. An PCI instance with two guest NUMA nodes and the
|
|
``socket`` policy can be affined to either:
|
|
|
|
* node 0 and node 1
|
|
* node 0 and node 2
|
|
* node 0 and node 3
|
|
* node 1 and node 2
|
|
* node 1 and node 3
|
|
|
|
The instance cannot be affined to node 2 and node 3, as neither of those
|
|
are on the same socket as the PCI device. If the other nodes are consumed
|
|
by other instances and only nodes 2 and 3 are available, the instance
|
|
will not boot.
|
|
|
|
**preferred**
|
|
This policy means that ``nova-scheduler`` will choose a compute host
|
|
with minimal consideration for the NUMA affinity of PCI devices.
|
|
``nova-compute`` will attempt a best effort selection of PCI devices
|
|
based on NUMA affinity, however, if this is not possible then
|
|
``nova-compute`` will fall back to scheduling on a NUMA node that is not
|
|
associated with the PCI device.
|
|
|
|
**legacy**
|
|
This is the default policy and it describes the current nova behavior.
|
|
Usually we have information about association of PCI devices with NUMA
|
|
nodes. However, some PCI devices do not provide such information. The
|
|
``legacy`` value will mean that nova will boot instances with PCI device
|
|
if either:
|
|
|
|
* The PCI device is associated with at least one NUMA nodes on which the
|
|
instance will be booted
|
|
|
|
* There is no information about PCI-NUMA affinity available
|
|
|
|
For example, to configure a flavor to use the ``preferred`` PCI NUMA affinity
|
|
policy for any neutron SR-IOV interfaces attached by the user:
|
|
|
|
.. code-block:: console
|
|
|
|
$ openstack flavor set $FLAVOR \
|
|
--property hw:pci_numa_affinity_policy=preferred
|
|
|
|
You can also configure this for PCI passthrough devices by specifying the
|
|
policy in the alias configuration via :oslo.config:option:`pci.alias`. For more
|
|
information, refer to :oslo.config:option:`the documentation <pci.alias>`.
|