Eric MacDonald a7c75ad8de Fix utf-8 decode traceback in the report tool strings algorithm
This update adds a try or except on UnicodeDecodeError to the
utf-8 decode case in the _evaluate_substring function of the
strings algorithm.

Added a dropped logs file to record a list of logs that fail the utf-8
conversion. The dropped logs file is stored as dropped_logs in the
report analysis plugins output folder.

Test Plan:

PASS: Verify on collect bundle that exhibited the issue.
PASS: Verify dropped logs are recorded in new
      report_analysis/plugins/dropped_logs file.
PASS: Verify no newly introduced pep8 errors or warnings.

Closes-Bug: 2067952
Change-Id: I82cecd6bb537a459255467f6ced7e742c5771d97
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2024-07-04 17:00:39 +00:00
..
2023-07-21 16:51:54 +00:00

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

The Report tool is used to gather relevant log, events
and information about the system from a collect bundle
and present that data for quick / easy issue analysis.

Report can run directly from a cloned starlingX utilities git

    ${MY_REPO}/stx/utilities/tools/collector/debian-scripts/report/report.py {options}

Report is installed and can be run on any 22.12 POSTGA system node.

    /usr/local/bin/report/report.py --directory /scratch

Report can also be commanded to automatically run during a collect operation

    collect all --report

See Report's --help option for additional optional command line arguments.

    report.py --help

Selecting the right command option for your collect bundle:

   Report is designed to analyze a host or subcloud 'collect bundle'.
   Report needs to be told where to find the collect bundle to analyze
   using one of three options

   Analyze Host Bundle: --bundle or -b option
   -------------------

      Use this option to point to a 'directory' that 'contains'
      host tarball files.

          report.py --bundle /scratch/ALL_NODES_YYYYMMDD_hhmmss

      Point to a directory containing host tarballs.
      Such directory contains hostname's tarballs ; ending in tgz

          /scratch/ALL_NODES_YYYYMMDD_hhmmss
          ├── controller-0_YYYMMDD_hhmmss.tgz
          └── controller-1_YYYMMDD_hhmmss.tgz

      This is the option collect uses to auto analyze a just
      collected bundle with the collect --report option.

    Analyze Directory: --directory or -d option
    -----------------

       Use this option when a collect bundle 'tar file' is in a
       in a specific 'directory'. If there are multiple collect
       bundles in that directory then the tool will prompt the
       user to select one from a list.

           report.py --directory /scratch

           0 - exit
           1 - ALL_NODES_20230608.235225
           2 - ALL_NODES_20230609.004604
           Please select bundle to analyze:

       Analysis proceeds automatically if there is only a
       single collect bundle found.

    Analyze Specific Collect Bundle tar file: --file or -f option
    ----------------------------------------

        Use this option to point to a specific collect bundle
        tar file to analyze.

            report.py --file /scratch/ALL_NODES_YYYYMMDD_hhmmss.tar

Host vs Subcloud Collect Bundles:

Expected Host Bundle Format:

    ├── SELECT_NODES_YYYYMMDD.hhmmss.tar
    ├── SELECT_NODES_YYYYMMDD.hhmmss
         ├── controller-0_YYYYMMDD.hhmmss
         ├── controller-0_YYYYMMDD.hhmmss.tgz
         ├── controller-1_YYYYMMDD.hhmmss
         ├── controller-1_YYYYMMDD.hhmmss.tgz
         ├── worker-0_YYYYMMDD.hhmmss
         └── worker-1_YYYYMMDD.hhmmss.tgz

Expected Subcloud Bundle Format

    ├── SELECT_SUBCLOUDS_YYYYMMDD.hhmmss.tar
    └── SELECT_SUBCLOUDS_YYYYMMDD.hhmmss
        ├── subcloudX_YYYYMMDD.hhmmss.tar
        ├── subcloudX_YYYYMMDD.hhmmss
        ├── controller-0_YYYYMMDD.hhmmss
        ├── controller-0_YYYYMMDD.hhmmss.tgz
        │   ├── report_analysis
        │   └── report_tool.tgz
        ├── subcloudY_YYYYMMDD.hhmmss.tar
        ├── subcloudY_YYYYMMDD.hhmmss
         ├── controller-0_YYYYMMDD.hhmmss
         ├── controller-0_YYYYMMDD.hhmmss.tgz
         ├── report_analysis
         └── report_tool.tgz
        ├── subcloudZ_YYYYMMDD.hhmmss.tar
        └── subcloudZ_YYYYMMDD.hhmmss
            ├── controller-0_YYYYMMDD.hhmmss
            └── controller-0_YYYYMMDD.hhmmss.tgz

If there are multiple bundles found at the specified --directory
then the list is displayed and the user is prompted to select a
bundle from the list.

This would be typical when analyzing a selected subcloud collect
bundle like in the example below

        $ report -d /localdisk/issues/SELECT_SUBCLOUDS_YYYYMMDD.hhmmss.tar

    Report will extract the subcloud tar file and if it sees more
    than one tar file it will prompt the user to select which one
    to analyze

        0 - exit
        1 - subcloudX_YYYYMMDD.hhmmss
        2 - subcloudY_YYYYMMDD.hhmmss
        3 - subcloudZ_YYYYMMDD.hhmmss
        Please select the bundle to analyze:

Refer to report.py file header for a description of the tool

Report places the report analysis in the bundle itself.
Consider the following collect bundle structure and notice 
the 'report_analysis' folder which contians the Report analysis.

    SELECT_NODES_20220527.193605
    ├── controller-0_20220527.193605
    │   ├── etc
    │   ├── root
    │   └── var
    ├── controller-1_20220527.193605
    │   ├── etc
    │   ├── root
    │   └── var
    └── report_analysis (where the output files will be placed)

Pass a collect bundle to Report's CLI for two phases of processing ...

    Phase 1: Process algorithm specific plugins to collect plugin
             specific 'report logs'. Basically fault, event,
             alarm and state change logs.

    Phase 2: Run the correlator against the plugin found 'report logs'
             to produce descriptive strings that represent failures
             that were found in the collect bundle and to summarize
             the events, alarms and state change data.

Report then produces a report analysis that gets stored with
the original bundle.

Example Analysis:

$ report -d /localdisk/CGTS-44887

extracting /localdisk/CGTS-44887/ALL_NODES_20230307.183540.tar

Report: /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis

extracting : /localdisk/CGTS-44887/ALL_NODES_20230307.183540/controller-1_20230307.183540.tgz
extracting : /localdisk/CGTS-44887/ALL_NODES_20230307.183540/compute-0_20230307.183540.tgz
extracting : /localdisk/CGTS-44887/ALL_NODES_20230307.183540/controller-0_20230307.183540.tgz
extracting : /localdisk/CGTS-44887/ALL_NODES_20230307.183540/compute-1_20230307.183540.tgz

Active Ctrl: controller-1
System Type: All-in-one
S/W Version: 22.12
System Mode: duplex
DC Role    : systemcontroller
Node Type  : controller
subfunction: controller,worker
Mgmt Iface : vlan809
Clstr Iface: vlan909
OAM Iface  : eno8403
OS Release : Debian GNU/Linux 11 (bullseye)
Build Type : Formal
Build Date : 2023-03-01 23:00:06 +0000
controllers: controller-1,controller-0
workers    : compute-1,compute-0

Plugin Results:

  621 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/log
  221 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/swact_activity
  132 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/alarm
   85 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/substring_controller-0
   60 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/system_info
   54 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/maintenance_errors
   36 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/heartbeat_loss
   26 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/process_failures
   16 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/state_changes
   13 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/substring_controller-1
    2 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/plugins/puppet_errors

... nothing found by plugins: daemon_failures

Correlated Results:

Events       : 8  /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/events
Alarms       : 26 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/alarms
State Changes: 16 /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/state_changes
Failures     : 4  /localdisk/CGTS-44887/ALL_NODES_20230307.183540/report_analysis/failures
2023-03-07T05:00:11 controller-0 uncontrolled swact
2023-03-07T05:01:52 controller-0 heartbeat loss failure
2023-03-07T17:42:35 controller-0 configuration failure
2023-03-07T17:58:06 controller-0 goenabled failure

Inspect the Correlated and Plugin results files for failures,
alarms, events and state changes.

The report analysis and collect bundle can be viewed in a html browser
by loading the index.html file is created in the report_analysis folder
when the report tool is run.
The rendering tool is displayed with a menu-content layout.
There are four sections:
System Information, Correlated Results, Plugin Results, Collect Bundle.
System Information contains controller, storage, and worker.
controller-0 is shown by default.
Users can click '+'/'-' in menu to show/hide system info contents in content panel.

System Info        controller-0
                   ------------
- controller-0     System Type: All-in-one
+ controller-1     S/W Version: 22.12
------------       System Node: duplex
- Storage          DC Role    : systemcontroller
+ storage-0        Node Type  : Controller
+ storage-1        subfunction: controller
------------       Mgmt Iface : vlan166
- Workers          Clstr Iface: vlan167
+ compute-0        Build Type : formal
+ compute-1        Build Date : 2022-12-19 07:22:00 +0000
                   controllers: controller-1, controller-0
                   workers    : compute-0, compute-1

Result section contains Correlated Results and Plugin Results.
Both Correlated Results and Plugin Results have their subitems.
Collect Bundle is shown after the Result section.
Clicking the menu item will show the content in the right panel.

Menu Default
- Correlated Results
  failures
  state_changes
  events
  alarms
--------
+ Plugin Results
--------
+ Collect Bundle

Menu Expanded
- Correlated Results
  failures
  state_changes
  events
  alarms
--------
- Plugin Results
  substring_controller-0
  puppet_errors
  log
  state_changes
  alarm
  daemon_failures
  heartbeat_loss
  substring_controller-1
  process_failures
  maintenance_errors
  swact_activity
--------
- Collect Bundle
  controller-0_20231214.180318
  controller-1_20231214.180318
  storage-0_20231214.180318
  storage-1_20231214.180318
  compute-0_20231214.180318
  compute-1_20231214.180318

Inside Collect Bundle
controller-0_20231214.180318
+ var
+ root
+ etc

Opening the Collect Bundle menu shows all the collect bundle items.
Clicking a bundle will open the new tab corresponding to that bundle.
If the folder or file is empty or does not have permission to open,
it will show grey color.
Files that are not empty will show light green color.
File content is shown on the right panel, as previous layouts.
For files that do not have a specific extension, a new tab will be opened.
If they cannot be opened, a download popup will be shown, or it will be directly downloaded,
depending on the browser settings.