Eric MacDonald 712187a496 Report Tool: Improve plugin handling
A recent update introduced an empty file (__init__.py) in the plugins
folder which was causing report traceback failures for off system runs.

Also the current handling of the --plugin option is broken.
The fix to that issue lead to a few additional more general plugin
handling improvements.

Test Plan:

PASS: Verify ignore handling of empty plugin files.
PASS: Verify all python file permissions set to executable
      on fresh pull in git and after on-system package install.
PASS: Verify all plugin file permissions are not executable
      on fresh pull in git and after on-system package install.
PASS: Verify general handling of the --plugin option with space
      delimited plugins that follow.
PASS: Verify correlator is not run if there is no plugin data
      to correlate.
PASS: Verify missing plugin output log files do not lead to a
      file not found error on the console.
PASS: Verify refactored plugin search handling success and
      error paths.
PASS: Verify refactored plugin search handling finds and adds
      built-in and localhost plugins with and without the --plugin
      option specified.
PASS: Verify that previous plugin data is removed prior to a rerun
      of the tool. This is helpful for localhost plugin development.
PASS: Verify handling of adding multiple plugins that span both
      built-in and localhost locations.
PASS: Verify handling of missing plugin(s) when specified with
      the --plugin option.

Regression:

PASS: Verify collector package build and passes tox.
PASS: Verify both on-system and off-system Report handling.
PASS: Verify collect all using --report option
PASS: Verify logging with and without --debug option.
PASS: Verify no pep8 errors or warnings.

Story: 2010533
Task: 48433
Task: 48432
Task: 48443

Change-Id: I42616daad2de6b0785f11736ef20b11e19f19869
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2023-07-21 16:51:54 +00:00
..
2023-07-21 16:51:54 +00:00
2023-07-21 16:51:54 +00:00

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

The Report tool is used to gather relevant log, events
and information about the system from a collect bundle
and present that data for quick / easy issue analysis.

Report can run directly from a cloned starlingX utilities git

    ${MY_REPO}/stx/utilities/tools/collector/debian-scripts/report/report.py {options}

Report is installed and can be run on any 22.12 POSTGA system node.

    /usr/local/bin/report/report.py --directory /scratch

Report can also be commanded to automatically run during a collect operation

    collect all --report

See Report's --help option for additional optional command line arguments.

    report.py --help

Selecting the right command option for your collect bundle:

   Report is designed to analyze a host or subcloud 'collect bundle'.
   Report needs to be told where to find the collect bundle to analyze
   using one of three options

   Analyze Host Bundle: --bundle or -b option
   -------------------

      Use this option to point to a 'directory' that 'contains'
      host tarball files.

          report.py --bundle /scratch/ALL_NODES_YYYYMMDD_hhmmss

      Point to a directory containing host tarballs.
      Such directory contains hostname's tarballs ; ending in tgz

          /scratch/ALL_NODES_YYYYMMDD_hhmmss
          ├── controller-0_YYYMMDD_hhmmss.tgz
          └── controller-1_YYYMMDD_hhmmss.tgz

      This is the option collect uses to auto analyze a just
      collected bundle with the collect --report option.

    Analyze Directory: --directory or -d option
    -----------------

       Use this option when a collect bundle 'tar file' is in a
       in a specific 'directory'. If there are multiple collect
       bundles in that directory then the tool will prompt the
       user to select one from a list.

           report.py --directory /scratch

           0 - exit
           1 - ALL_NODES_20230608.235225
           2 - ALL_NODES_20230609.004604
           Please select bundle to analyze:

       Analysis proceeds automatically if there is only a
       single collect bundle found.

    Analyze Specific Collect Bundle tar file: --file or -f option
    ----------------------------------------

        Use this option to point to a specific collect bundle
        tar file to analyze.

            report.py --file /scratch/ALL_NODES_YYYYMMDD_hhmmss.tar

Host vs Subcloud Collect Bundles:

Expected Host Bundle Format:

    ├── SELECT_NODES_YYYYMMDD.hhmmss.tar
    ├── SELECT_NODES_YYYYMMDD.hhmmss
         ├── controller-0_YYYYMMDD.hhmmss
         ├── controller-0_YYYYMMDD.hhmmss.tgz
         ├── controller-1_YYYYMMDD.hhmmss
         ├── controller-1_YYYYMMDD.hhmmss.tgz
         ├── worker-0_YYYYMMDD.hhmmss
         └── worker-1_YYYYMMDD.hhmmss.tgz

Expected Subcloud Bundle Format

    ├── SELECT_SUBCLOUDS_YYYYMMDD.hhmmss.tar
    └── SELECT_SUBCLOUDS_YYYYMMDD.hhmmss
        ├── subcloudX_YYYYMMDD.hhmmss.tar
        ├── subcloudX_YYYYMMDD.hhmmss
        ├── controller-0_YYYYMMDD.hhmmss
        ├── controller-0_YYYYMMDD.hhmmss.tgz
        │   ├── report_analysis
        │   └── report_tool.tgz
        ├── subcloudY_YYYYMMDD.hhmmss.tar
        ├── subcloudY_YYYYMMDD.hhmmss
         ├── controller-0_YYYYMMDD.hhmmss
         ├── controller-0_YYYYMMDD.hhmmss.tgz
         ├── report_analysis
         └── report_tool.tgz
        ├── subcloudZ_YYYYMMDD.hhmmss.tar
        └── subcloudZ_YYYYMMDD.hhmmss
            ├── controller-0_YYYYMMDD.hhmmss
            └── controller-0_YYYYMMDD.hhmmss.tgz

If there are multiple bundles found at the specified --directory
then the list is displayed and the user is prompted to select a
bundle from the list.

This would be typical when analyzing a selected subcloud collect
bundle like in the example below

        $ report -d /localdisk/issues/SELECT_SUBCLOUDS_YYYYMMDD.hhmmss.tar

    Report will extract the subcloud tar file and if it sees more
    than one tar file it will prompt the user to select which one
    to analyze

        0 - exit
        1 - subcloudX_YYYYMMDD.hhmmss
        2 - subcloudY_YYYYMMDD.hhmmss
        3 - subcloudZ_YYYYMMDD.hhmmss
        Please select the bundle to analyze:

Refer to report.py file header for a description of the tool

Report places the report analysis in the bundle itself.
Consider the following collect bundle structure and notice 
the 'report_analysis' folder which contians the Report analysis.

    SELECT_NODES_20220527.193605
    ├── controller-0_20220527.193605
    │   ├── etc
    │   ├── root
    │   └── var
    ├── controller-1_20220527.193605
    │   ├── etc
    │   ├── root
    │   └── var
    └── report_analysis (where the output files will be placed)

Pass a collect bundle to Report's CLI for two phases of processing ...

    Phase 1: Process algorithm specific plugins to collect plugin
             specific 'report logs'. Basically fault, event,
             alarm and state change logs.

    Phase 2: Run the correlator against the plugin found 'report logs'
             to produce descriptive strings that represent failures
             that were found in the collect bundle and to summarize
             the events, alarms and state change data.

Report then produces a report analysis that gets stored with
the original bundle.

Example Analysis:

$ report -d /localdisk/CGTS-44887

extracting /localdisk/CGTS-44887/ALL_NODES_20230307.183540.tar

Report: /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis

extracting : /localdisk/CGTS-44887/ALL_NODES_20230307.183540/controller-1_20230307.183540.tgz
extracting : /localdisk/CGTS-44887/ALL_NODES_20230307.183540/compute-0_20230307.183540.tgz
extracting : /localdisk/CGTS-44887/ALL_NODES_20230307.183540/controller-0_20230307.183540.tgz
extracting : /localdisk/CGTS-44887/ALL_NODES_20230307.183540/compute-1_20230307.183540.tgz

Active Ctrl: controller-1
System Type: All-in-one
S/W Version: 22.12
System Mode: duplex
DC Role    : systemcontroller
Node Type  : controller
subfunction: controller,worker
Mgmt Iface : vlan809
Clstr Iface: vlan909
OAM Iface  : eno8403
OS Release : Debian GNU/Linux 11 (bullseye)
Build Type : Formal
Build Date : 2023-03-01 23:00:06 +0000
controllers: controller-1,controller-0
workers    : compute-1,compute-0

Plugin Results:

  621 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/log
  221 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/swact_activity
  132 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/alarm
   85 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/substring_controller-0
   60 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/system_info
   54 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/maintenance_errors
   36 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/heartbeat_loss
   26 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/process_failures
   16 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/state_changes
   13 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/substring_controller-1
    2 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/puppet_errors

... nothing found by plugins: daemon_failures

Correlated Results:

Events       : 8  /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/events
Alarms       : 26 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/alarms
State Changes: 16 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/state_changes
Failures     : 4  /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/failures
2023-03-07T05:00:11 controller-0 uncontrolled swact
2023-03-07T05:01:52 controller-0 heartbeat loss failure
2023-03-07T17:42:35 controller-0 configuration failure
2023-03-07T17:58:06 controller-0 goenabled failure

Inspect the Correlated and Plugin results files for failures,
alarms, events and state changes.