
The collect tool executes various Linux and system commands to gather data into an archived collect bundle. System administrators often need to run collect on busy, in-service systems. During such operations, they have reported excessive CPU usage, which can lead to undesirable CPU spikes caused by certain collect operations. While the collect tool already employs throttling for ssh and scp, its data collection and archiving commands currently lack similar safeguards. This update introduces the following enhancements to mitigate CPU spikes and improve performance on heavily loaded in-service servers: - removed one unnecessary tar archive operation. - add tar archive checkpoint option support with action handler - removed one redundant kubelet api-resources call in the containerization plugin. - add --chunk-size=50 support to all in one kubelet get api-resources command to help throttle this long running heavyweight command. 50 seems to yield the lowest k8s api latency as measured with the k8smetrics tool. - launch collect plugins with 'nice' and 'ionice' attributes. - add 'nice' and 'ionice' attributes to select commands. - add sleep delays after known cpu intensive data collection commands. - remove unnecessary -v (verbose) option to all tar commands. - add a run_command utility that times the execution of commands and adaptively adds a small post execution delay based on how long that command took to run. - reduce the cpu impact of the containerization plugin by adding periodic delays. - added a few periodic delays in long running or cpu intensive plugins - create a collect command timing log that is added to each the host collect tarball. - timing log file records how long it took for each plugin to run as well as commands called with the new run_command function. - fixed issue in networking plugin. - added a 60 second timeout for the 'lsof' heavyweight command. - fixed delimiter string hostname in all plugins. - increase the default global timeout from 20 to 30 minutes. - increase the default collect_host timeout from 600 to 900 seconds. - incremented tool minor version. These improvements aim to minimize the performance impact of running collect on busy in-service systems. Note: When a process is started with nice, its CPU priority is inherited by all threads spawned by that process. However, it does not restrict the total CPU time a process or its threads can use when no contention exists. Test Plan: PASS: Verify build and install of collect package. PASS: Verify collect runtime is not substantially longer. PASS: Verify tar checkpoint handling on busy system where checkpoint action handler detects and invokes system overload handling. PASS: Verify some CPU spike reduction compared to before update. Regression: PASS: Compare collect bundle size and contents before and after update. PASS: Soak collect on busy/overloaded AIO SX system. PASS: Verify report tool reports the same data before/after update. PASS: Verify multi-node collect Closes-Bug: 2090923 Change-Id: If698d5f275f4482de205fa4a37e0398b19800777 Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
utilities
This file serves as documentation for the components and features included on the utilities repository.
PCI IRQ Affinity Agent
While in OpenStack it is possible to enable instances to use PCI devices, the interrupts generated by these devices may be handled by host CPUs that are unrelated to the instance, and this can lead to a performance that is lower than it could be if the device interrupts were handled by the instance CPUs.
The agent only acts over instances with dedicated vCPUs. For instances using shared vCPUs no action will be taken by the agent.
The expected outcome from the agent operation is achieving a higher performance by assigning the instances core to handle the interrupts from PCI devices used by these instances and avoid interrupts consuming excessive cycles from the platform cores.
Agent operation
The agent operates by listening to RabbitMQ notifications from Nova. When an instance is created or moved to the host, the agent checks for an specific flavor spec (detailed below) and if it does then it queries libvirt to map the instance vCPUs into pCPUs from the host.
Once the agent has the CPU mapping, it determines the IRQ for each PCI device used by the instance, and then it loops over all PCI devices and determines which host NUMA node is associated with the device, the pCPUs that are associated with the NUMA node and finally set the CPU affinity for the IRQs of the PCI device based on the pCPU list.
There is also a periodic audit that runs every minute and loops over the existing IRQs, so that if there are new IRQs that weren't mapped before the agent maps them, and if there are PCI devices that aren't associated to an instance that they were before, their IRQ affinity is reset to the default value.
Flavor spec
The PCI IRQ Affinity Agent uses a specific flavor spec for PCI interrupt affining, that is used to determine which vCPUs assigned to the instance must handle the interrupts from the PCI devices:
hw:pci_irq_affinity_mask=<vcpus_cpulist>
Where vcpus_cpulist
can assume a comma-separated list of
values that can be expressed as:
int
: the vCPU expressed byint
will be assigned to handle the interruptions from the PCI devicesint1-int2
: the vCPUs betweenint1
andint2
(inclusive) will be used to handle the interruptions from the PCI devices^int
: the vCPU expressed byint
will not be assigned to handle the interruptions from the PCI devices and shall be used to exclude a vCPU that was included in a previous range
NOTE: int
must be a value between
0
and flavor.vcpus - 1
Example: hw_pci_irq_affinity_mask=1-4,^3,6
means that
vCPUs with indexes 1,2,4 and 6
from the vCPU list that Nova
allocates to the instance will be assigned to handle interruptions from
the PCI devices.
Limitations
- No CPU affining is performed for instances using shared CPUs (i.e.,
when using flavor spec
hw:cpu_policy=shared
) - No CPU affining will be performed when invalid ranges are specified on the flavor spec, the agent instead will log error messages indicating the problem
Agent packaging
The agent code resides on the starlingx/utilities
repo,
along with the spec and docker_image files that are used to build a
CentOS image with the agent wheel installed on it.
The agent is deployed by Armada along with the other OpenStack helm
charts; refer to PCI
IRQ Affinity Agent helm chart on
starlingx/openstack-armada-app
repository.