Eric MacDonald 94afab2a6b Reduce CPU spikes during collect
The collect tool executes various Linux and system commands to gather
data into an archived collect bundle.

System administrators often need to run collect on busy, in-service
systems. During such operations, they have reported excessive CPU
usage, which can lead to undesirable CPU spikes caused by certain
collect operations.

While the collect tool already employs throttling for ssh and scp,
its data collection and archiving commands currently lack similar
safeguards.

This update introduces the following enhancements to mitigate CPU
spikes and improve performance on heavily loaded in-service servers:

 - removed one unnecessary tar archive operation.
 - add tar archive checkpoint option support with action handler
 - removed one redundant kubelet api-resources call in the
   containerization plugin.
 - add --chunk-size=50 support to all in one kubelet get api-resources
   command to help throttle this long running heavyweight command.
   50 seems to yield the lowest k8s api latency as measured with the
   k8smetrics tool.
 - launch collect plugins with 'nice' and 'ionice' attributes.
 - add 'nice' and 'ionice' attributes to select commands.
 - add sleep delays after known cpu intensive data collection commands.
 - remove unnecessary -v (verbose) option to all tar commands.
 - add a run_command utility that times the execution of commands
   and adaptively adds a small post execution delay based on how
   long that command took to run.
 - reduce the cpu impact of the containerization plugin by adding
   periodic delays.
 - added a few periodic delays in long running or cpu intensive plugins
 - create a collect command timing log that is added to each the
   host collect tarball.
 - timing log file records how long it took for each plugin to run as
   well as commands called with the new run_command function.
 - fixed issue in networking plugin.
 - added a 60 second timeout for the 'lsof' heavyweight command.
 - fixed delimiter string hostname in all plugins.
 - increase the default global timeout from 20 to 30 minutes.
 - increase the default collect_host timeout from 600 to 900 seconds.
 - incremented tool minor version.

These improvements aim to minimize the performance impact of running
collect on busy in-service systems.

Note: When a process is started with nice, its CPU priority is
      inherited by all threads spawned by that process.
      However, it does not restrict the total CPU time a process
      or its threads can use when no contention exists.

Test Plan:

PASS: Verify build and install of collect package.
PASS: Verify collect runtime is not substantially longer.
PASS: Verify tar checkpoint handling on busy system where checkpoint
      action handler detects and invokes system overload handling.
PASS: Verify some CPU spike reduction compared to before update.

Regression:

PASS: Compare collect bundle size and contents before and after update.
PASS: Soak collect on busy/overloaded AIO SX system.
PASS: Verify report tool reports the same data before/after update.
PASS: Verify multi-node collect

Closes-Bug: 2090923
Change-Id: If698d5f275f4482de205fa4a37e0398b19800777
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2024-12-06 03:01:06 +00:00
2024-12-06 03:01:06 +00:00
2023-09-15 18:41:49 +00:00
2023-09-15 18:41:49 +00:00
2019-09-09 13:43:49 -05:00
2019-09-09 13:43:49 -05:00
2023-09-15 18:41:49 +00:00

utilities

This file serves as documentation for the components and features included on the utilities repository.

PCI IRQ Affinity Agent

While in OpenStack it is possible to enable instances to use PCI devices, the interrupts generated by these devices may be handled by host CPUs that are unrelated to the instance, and this can lead to a performance that is lower than it could be if the device interrupts were handled by the instance CPUs.

The agent only acts over instances with dedicated vCPUs. For instances using shared vCPUs no action will be taken by the agent.

The expected outcome from the agent operation is achieving a higher performance by assigning the instances core to handle the interrupts from PCI devices used by these instances and avoid interrupts consuming excessive cycles from the platform cores.

Agent operation

The agent operates by listening to RabbitMQ notifications from Nova. When an instance is created or moved to the host, the agent checks for an specific flavor spec (detailed below) and if it does then it queries libvirt to map the instance vCPUs into pCPUs from the host.

Once the agent has the CPU mapping, it determines the IRQ for each PCI device used by the instance, and then it loops over all PCI devices and determines which host NUMA node is associated with the device, the pCPUs that are associated with the NUMA node and finally set the CPU affinity for the IRQs of the PCI device based on the pCPU list.

There is also a periodic audit that runs every minute and loops over the existing IRQs, so that if there are new IRQs that weren't mapped before the agent maps them, and if there are PCI devices that aren't associated to an instance that they were before, their IRQ affinity is reset to the default value.

Flavor spec

The PCI IRQ Affinity Agent uses a specific flavor spec for PCI interrupt affining, that is used to determine which vCPUs assigned to the instance must handle the interrupts from the PCI devices:

  • hw:pci_irq_affinity_mask=<vcpus_cpulist>

Where vcpus_cpulist can assume a comma-separated list of values that can be expressed as:

  • int: the vCPU expressed by int will be assigned to handle the interruptions from the PCI devices
  • int1-int2: the vCPUs between int1 and int2 (inclusive) will be used to handle the interruptions from the PCI devices
  • ^int: the vCPU expressed by int will not be assigned to handle the interruptions from the PCI devices and shall be used to exclude a vCPU that was included in a previous range

NOTE: int must be a value between 0 and flavor.vcpus - 1

Example: hw_pci_irq_affinity_mask=1-4,^3,6 means that vCPUs with indexes 1,2,4 and 6 from the vCPU list that Nova allocates to the instance will be assigned to handle interruptions from the PCI devices.

Limitations

  • No CPU affining is performed for instances using shared CPUs (i.e., when using flavor spec hw:cpu_policy=shared)
  • No CPU affining will be performed when invalid ranges are specified on the flavor spec, the agent instead will log error messages indicating the problem

Agent packaging

The agent code resides on the starlingx/utilities repo, along with the spec and docker_image files that are used to build a CentOS image with the agent wheel installed on it.

The agent is deployed by Armada along with the other OpenStack helm charts; refer to PCI IRQ Affinity Agent helm chart on starlingx/openstack-armada-app repository.

Description
StarlingX miscellaneous tools and utilities
Readme 11 MiB
Languages
Shell 54%
Python 38.9%
C 3.3%
Makefile 1.2%
HTML 1.1%
Other 1.3%