A recent update introduced an empty file (__init__.py) in the plugins
folder which was causing report traceback failures for off system runs.
Also the current handling of the --plugin option is broken.
The fix to that issue lead to a few additional more general plugin
handling improvements.
Test Plan:
PASS: Verify ignore handling of empty plugin files.
PASS: Verify all python file permissions set to executable
on fresh pull in git and after on-system package install.
PASS: Verify all plugin file permissions are not executable
on fresh pull in git and after on-system package install.
PASS: Verify general handling of the --plugin option with space
delimited plugins that follow.
PASS: Verify correlator is not run if there is no plugin data
to correlate.
PASS: Verify missing plugin output log files do not lead to a
file not found error on the console.
PASS: Verify refactored plugin search handling success and
error paths.
PASS: Verify refactored plugin search handling finds and adds
built-in and localhost plugins with and without the --plugin
option specified.
PASS: Verify that previous plugin data is removed prior to a rerun
of the tool. This is helpful for localhost plugin development.
PASS: Verify handling of adding multiple plugins that span both
built-in and localhost locations.
PASS: Verify handling of missing plugin(s) when specified with
the --plugin option.
Regression:
PASS: Verify collector package build and passes tox.
PASS: Verify both on-system and off-system Report handling.
PASS: Verify collect all using --report option
PASS: Verify logging with and without --debug option.
PASS: Verify no pep8 errors or warnings.
Story: 2010533
Task: 48433
Task: 48432
Task: 48443
Change-Id: I42616daad2de6b0785f11736ef20b11e19f19869
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
The Report tool is used to gather relevant log, events
and information about the system from a collect bundle
and present that data for quick / easy issue analysis.
Report can run directly from a cloned starlingX utilities git
${MY_REPO}/stx/utilities/tools/collector/debian-scripts/report/report.py {options}
Report is installed and can be run on any 22.12 POSTGA system node.
/usr/local/bin/report/report.py --directory /scratch
Report can also be commanded to automatically run during a collect operation
collect all --report
See Report's --help option for additional optional command line arguments.
report.py --help
Selecting the right command option for your collect bundle:
Report is designed to analyze a host or subcloud 'collect bundle'.
Report needs to be told where to find the collect bundle to analyze
using one of three options
Analyze Host Bundle: --bundle or -b option
-------------------
Use this option to point to a 'directory' that 'contains'
host tarball files.
report.py --bundle /scratch/ALL_NODES_YYYYMMDD_hhmmss
Point to a directory containing host tarballs.
Such directory contains hostname's tarballs ; ending in tgz
/scratch/ALL_NODES_YYYYMMDD_hhmmss
├── controller-0_YYYMMDD_hhmmss.tgz
└── controller-1_YYYMMDD_hhmmss.tgz
This is the option collect uses to auto analyze a just
collected bundle with the collect --report option.
Analyze Directory: --directory or -d option
-----------------
Use this option when a collect bundle 'tar file' is in a
in a specific 'directory'. If there are multiple collect
bundles in that directory then the tool will prompt the
user to select one from a list.
report.py --directory /scratch
0 - exit
1 - ALL_NODES_20230608.235225
2 - ALL_NODES_20230609.004604
Please select bundle to analyze:
Analysis proceeds automatically if there is only a
single collect bundle found.
Analyze Specific Collect Bundle tar file: --file or -f option
----------------------------------------
Use this option to point to a specific collect bundle
tar file to analyze.
report.py --file /scratch/ALL_NODES_YYYYMMDD_hhmmss.tar
Host vs Subcloud Collect Bundles:
Expected Host Bundle Format:
├── SELECT_NODES_YYYYMMDD.hhmmss.tar
├── SELECT_NODES_YYYYMMDD.hhmmss
├── controller-0_YYYYMMDD.hhmmss
├── controller-0_YYYYMMDD.hhmmss.tgz
├── controller-1_YYYYMMDD.hhmmss
├── controller-1_YYYYMMDD.hhmmss.tgz
├── worker-0_YYYYMMDD.hhmmss
└── worker-1_YYYYMMDD.hhmmss.tgz
Expected Subcloud Bundle Format
├── SELECT_SUBCLOUDS_YYYYMMDD.hhmmss.tar
└── SELECT_SUBCLOUDS_YYYYMMDD.hhmmss
├── subcloudX_YYYYMMDD.hhmmss.tar
├── subcloudX_YYYYMMDD.hhmmss
│ ├── controller-0_YYYYMMDD.hhmmss
│ ├── controller-0_YYYYMMDD.hhmmss.tgz
│ ├── report_analysis
│ └── report_tool.tgz
├── subcloudY_YYYYMMDD.hhmmss.tar
├── subcloudY_YYYYMMDD.hhmmss
│ ├── controller-0_YYYYMMDD.hhmmss
│ ├── controller-0_YYYYMMDD.hhmmss.tgz
│ ├── report_analysis
│ └── report_tool.tgz
├── subcloudZ_YYYYMMDD.hhmmss.tar
└── subcloudZ_YYYYMMDD.hhmmss
├── controller-0_YYYYMMDD.hhmmss
└── controller-0_YYYYMMDD.hhmmss.tgz
If there are multiple bundles found at the specified --directory
then the list is displayed and the user is prompted to select a
bundle from the list.
This would be typical when analyzing a selected subcloud collect
bundle like in the example below
$ report -d /localdisk/issues/SELECT_SUBCLOUDS_YYYYMMDD.hhmmss.tar
Report will extract the subcloud tar file and if it sees more
than one tar file it will prompt the user to select which one
to analyze
0 - exit
1 - subcloudX_YYYYMMDD.hhmmss
2 - subcloudY_YYYYMMDD.hhmmss
3 - subcloudZ_YYYYMMDD.hhmmss
Please select the bundle to analyze:
Refer to report.py file header for a description of the tool
Report places the report analysis in the bundle itself.
Consider the following collect bundle structure and notice
the 'report_analysis' folder which contians the Report analysis.
SELECT_NODES_20220527.193605
├── controller-0_20220527.193605
│ ├── etc
│ ├── root
│ └── var
├── controller-1_20220527.193605
│ ├── etc
│ ├── root
│ └── var
└── report_analysis (where the output files will be placed)
Pass a collect bundle to Report's CLI for two phases of processing ...
Phase 1: Process algorithm specific plugins to collect plugin
specific 'report logs'. Basically fault, event,
alarm and state change logs.
Phase 2: Run the correlator against the plugin found 'report logs'
to produce descriptive strings that represent failures
that were found in the collect bundle and to summarize
the events, alarms and state change data.
Report then produces a report analysisthat gets stored with
the original bundle.
Example Analysis:
$ report -d /localdisk/CGTS-44887
extracting /localdisk/CGTS-44887/ALL_NODES_20230307.183540.tar
Report: /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis
extracting : /localdisk/CGTS-44887/ALL_NODES_20230307.183540/controller-1_20230307.183540.tgz
extracting : /localdisk/CGTS-44887/ALL_NODES_20230307.183540/compute-0_20230307.183540.tgz
extracting : /localdisk/CGTS-44887/ALL_NODES_20230307.183540/controller-0_20230307.183540.tgz
extracting : /localdisk/CGTS-44887/ALL_NODES_20230307.183540/compute-1_20230307.183540.tgz
Active Ctrl: controller-1
System Type: All-in-one
S/W Version: 22.12
System Mode: duplex
DC Role : systemcontroller
Node Type : controller
subfunction: controller,worker
Mgmt Iface : vlan809
Clstr Iface: vlan909
OAM Iface : eno8403
OS Release : Debian GNU/Linux 11 (bullseye)
Build Type : Formal
Build Date : 2023-03-01 23:00:06 +0000
controllers: controller-1,controller-0
workers : compute-1,compute-0
Plugin Results:
621 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/log
221 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/swact_activity
132 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/alarm
85 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/substring_controller-0
60 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/system_info
54 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/maintenance_errors
36 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/heartbeat_loss
26 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/process_failures
16 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/state_changes
13 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/substring_controller-1
2 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/puppet_errors
... nothing found by plugins: daemon_failures
Correlated Results:
Events : 8 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/events
Alarms : 26 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/alarms
State Changes: 16 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/state_changes
Failures : 4 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/failures
2023-03-07T05:00:11 controller-0 uncontrolled swact
2023-03-07T05:01:52 controller-0 heartbeat loss failure
2023-03-07T17:42:35 controller-0 configuration failure
2023-03-07T17:58:06 controller-0 goenabled failure
Inspect the Correlated and Plugin results files for failures,
alarms, events and state changes.