Eric MacDonald 4f14f3bbf4 Make collectd plugins use FM API V2
Using Fm API V2 allows the collectd plugins to distinguish
between FM connection failures and no existing alarm query
requests on process startup as well as failure to clear or
assert alarms during runtime so that such actions can be
retried on next audit interval.

This allows the plugins to be more robust in its alarm
management and avoids leaving stuck alarms which fixes
the following three reported stuck alarm bugs.

Closes-Bug: https://bugs.launchpad.net/starlingx/+bug/1802535
Closes-Bug: https://bugs.launchpad.net/starlingx/+bug/1813974
Closes-Bug: https://bugs.launchpad.net/starlingx/+bug/1814944

Additional improvements were made to each plugin to handle
failure paths better with the V2 API.

Additional changes made by this update include:

1. fixed stale unmounted filesystems alarm handling
2. percent usage alarm actual readings are updated on change
3. fix of threshold values
4. add 2 decimal point resolution to % usage alarm text
5. added commented FIT code to mem, cpu and df plugins
6. reversed True/False return polarity in interface plugin functions

Test Plan:

Regression:
PASS: normal alarm handling with FM V2 API ; process startup
PASS: normal alarm handling with FM V2 API ; runtime alarm assert
PASS: normal alarm handling with FM V2 API ; runtime alarm clear

PASS: Verify alarms of unmounted fs gets automatically cleared
PASS: Verify interface alarm/clear operation

Robustness:
PASS: Verify general startup behavior of all plugins while FM
      is not running only to see it start at some later time.
PASS: Verify alarm handling over process startup with existing
      cpu alarms while FM not running.
PASS: Verify alarm handling over process startup with existing
      mem alarms while FM not running.
PASS: Verify alarm handling over process startup with existing
      df alarms while FM not running.

PASS: Verify runtime cpu plugin alarm assertion retry handling
PASS: Verify runtime cpu plugin alarm clear retry handling
PASS: Verify runtime cpu plugin handling over process restart
PASS: Verify alarm handling over process startup with existing
      cpu alarms while FM initially not running and then started.

PASS: Verify runtime mem plugin alarm assertion retry handling
PASS: Verify runtime mem plugin alarm clear retry handling
PASS: Verify runtime mem plugin handling over process restart
PASS: Verify alarm handling over process startup with existing
      mem alarms while FM initially not running and then started.

PASS: Verify runtime df plugin alarm assertion retry handling
PASS: Verify runtime df plugin alarm clear retry handling
PASS: Verify runtime df plugin handling over process restart
PASS: Verify alarm handling over process startup with existing
      df alarms while FM initially not running and then started.

PASS: Verify alarm set/clear threshold boundaries for cpu plugin
PASS: Verify alarm set/clear threshold boundaries for memory plugin
PASS: Verify alarm set/clear threshold boundaries for df plugin

New Features: ... threshold exceeded ; threshold 80.00%, actual 80.33%
PASS: Verify percent usage alarms are refreshed with current value
PASS: Verify percent usage alarms show two decimal points

Change-Id: Ibe173617d11c17bdc4b41115e25bd8c18b49807e
Signed-off-by: Eric MacDonald <eric.macdonald@windriver.com>
2019-05-01 12:13:27 -04:00

1584 lines
59 KiB
Python
Executable File

#
# Copyright (c) 2018-2019 Wind River Systems, Inc.
#
# SPDX-License-Identifier: Apache-2.0
#
# Version 1.0
#
############################################################################
#
# This file is the collectd 'FM Alarm' Notifier.
#
# This notifier manages raising and clearing alarms based on collectd
# notifications ; i.e. automatic collectd calls to this handler/notifier.
#
# Collectd process startup automatically calls this module's init_func which
# declares and initializes a plugObject class for plugin type in preparation
# for periodic ongoing monitoring where collectd calls notify_func for each
# plugin and instance of that plugin.
#
# All other class or common member functions implemented herein exist in
# support of that aformentioned initialization and periodic monitoring.
#
# Collects provides information about each event as an object passed to the
# notification handler ; the notification object.
#
# object.host - the hostname.
#
# object.plugin - the name of the plugin aka resource.
# object.plugin_instance - plugin instance string i.e. say mountpoint
# for df plugin or numa? node for memory.
# object.type, - the unit i.e. percent or absolute.
# object.type_instance - the attribute i.e. free, used, etc.
#
# object.severity - a integer value 0=OK , 1=warning, 2=failure.
# object.message - a log-able message containing the above along
# with the value.
#
# This notifier uses the notification object to manage plugin/instance alarms.
#
# To avoid stuck alarms or missing alarms the plugin thresholds should be
# configured with Persist = true and persistOK = true. Thes controls tell
# collectd to always send notifications regardless of state change ; which
# would be the case with these cobtrols set to false.
#
# Persist = false ; only send notifications on 'okay' to 'not okay' change.
# PersistOK = false ; only send notifications on 'not okay' to 'okay' change.
#
# With these both set to true in the threshold spec for the plugin then
# collectd will call this notifier for each audit plugin/instance audit.
#
# Collectd supports only 2 threshold severities ; warning and failure.
# The 'failure' maps to 'critical' while 'warning' maps to 'major' in FM.
#
# To avoid unnecessary load on FM, this notifier maintains current alarm
# state and only makes an FM call on alarm state changes. Current alarm state
# is queried by the init function called by collectd on process startup.
#
# Current alarm state is maintained by two severity lists for each plugin,
# a warnings list and a failures list.
#
# When a failure is reported against a specific plugin then that resources's
# entity_id is added to that plugin's alarm object's failures list. Similarly,
# warning assertions get their entity id added to plugin's alarm object's
# warnings list. Any entity id should only exist in one of the lists at one
# time or in none at all if the notification condition is 'okay' and the alarm
# is cleared.
#
# Adding Plugins:
#
# To add new plugin support just search for ADD_NEW_PLUGIN and add the data
# requested in that area.
#
# Example commands to read samples from the influx database
#
# SELECT * FROM df_value WHERE instance='root' AND type='percent_bytes' AND
# type_instance='used'
# SELECT * FROM cpu_value WHERE type='percent' AND type_instance='used'
# SELECT * FROM memory_value WHERE type='percent' AND type_instance='used'
#
############################################################################
#
# Import list
# UT imports
import os
import re
import uuid
import collectd
from threading import RLock as Lock
from fm_api import constants as fm_constants
from fm_api import fm_api
import tsconfig.tsconfig as tsc
import plugin_common as pc
# only load influxdb on the controller
if tsc.nodetype == 'controller':
from influxdb import InfluxDBClient
api = fm_api.FaultAPIsV2()
# Debug control
debug = False
debug_lists = False
want_state_audit = False
want_vswitch = False
# number of notifier loops before the state is object dumped
DEBUG_AUDIT = 2
# write a 'value' log on a the resource sample change of more than this amount
LOG_STEP = 10
# Number of back to back database update misses
MAX_NO_UPDATE_B4_ALARM = 5
# This plugin name
PLUGIN = 'alarm notifier'
# Path to the plugin's drop dir
PLUGIN_PATH = '/etc/collectd.d/'
# the name of the collectd samples database
DATABASE_NAME = 'collectd samples'
READING_TYPE__PERCENT_USAGE = '% usage'
# Default invalid threshold value
INVALID_THRESHOLD = float(-1)
# collectd severity definitions ;
# Note: can't seem to pull then in symbolically with a header
NOTIF_FAILURE = 1
NOTIF_WARNING = 2
NOTIF_OKAY = 4
PASS = 0
FAIL = 1
# Some plugin_instances are mangled by collectd.
# The filesystem plugin is especially bad for this.
# For instance the "/var/log" MountPoint instance is
# reported as "var-log".
# The following is a list of mangled instances list
# that need the '-' replaced with '/'.
#
# ADD_NEW_PLUGIN if there are new file systems being added that
# have subdirectories in the name then they will need to be added
# to the mangled list
mangled_list = {"dev-shm",
"var-log",
"var-run",
"var-lock",
"var-lib-rabbitmq",
"var-lib-postgresql",
"var-lib-ceph-mon",
"etc-nova-instances",
"opt-platform",
"opt-cgcs",
"opt-etcd",
"opt-extension",
"opt-backups"}
# ADD_NEW_PLUGIN: add new alarm id definition
ALARM_ID__CPU = "100.101"
ALARM_ID__MEM = "100.103"
ALARM_ID__DF = "100.104"
ALARM_ID__EXAMPLE = "100.113"
ALARM_ID__VSWITCH_CPU = "100.102"
ALARM_ID__VSWITCH_MEM = "100.115"
ALARM_ID__VSWITCH_PORT = "300.001"
ALARM_ID__VSWITCH_IFACE = "300.002"
# ADD_NEW_PLUGIN: add new alarm id to the list
ALARM_ID_LIST = [ALARM_ID__CPU,
ALARM_ID__MEM,
ALARM_ID__DF,
ALARM_ID__VSWITCH_CPU,
ALARM_ID__VSWITCH_MEM,
ALARM_ID__VSWITCH_PORT,
ALARM_ID__VSWITCH_IFACE,
ALARM_ID__EXAMPLE]
# ADD_NEW_PLUGIN: add plugin name definition
# WARNING: This must line up exactly with the plugin
# filename without the extension.
PLUGIN__DF = "df"
PLUGIN__CPU = "cpu"
PLUGIN__MEM = "memory"
PLUGIN__INTERFACE = "interface"
PLUGIN__NTP_QUERY = "ntpq"
PLUGIN__VSWITCH_PORT = "vswitch_port"
PLUGIN__VSWITCH_CPU = "vswitch_cpu"
PLUGIN__VSWITCH_MEM = "vswitch_mem"
PLUGIN__VSWITCH_IFACE = "vswitch_iface"
PLUGIN__EXAMPLE = "example"
# ADD_NEW_PLUGIN: add plugin name to list
PLUGIN_NAME_LIST = [PLUGIN__CPU,
PLUGIN__MEM,
PLUGIN__DF,
PLUGIN__VSWITCH_CPU,
PLUGIN__VSWITCH_MEM,
PLUGIN__VSWITCH_PORT,
PLUGIN__VSWITCH_IFACE,
PLUGIN__EXAMPLE]
# PluginObject Class
class PluginObject:
dbObj = None # shared database connection obj
host = None # saved hostname
lock = None # global lock for mread_func mutex
database_setup = False # state of database setup
database_setup_in_progress = False # connection mutex
# Set to True once FM connectivity is verified
# Used to ensure alarms are queried on startup
fm_connectivity = False
def __init__(self, id, plugin):
"""PluginObject Class constructor"""
# plugin specific static class members.
self.id = id # alarm id ; 100.1??
self.plugin = plugin # name of the plugin ; df, cpu, memory ...
self.plugin_instance = "" # the instance name for the plugin
self.resource_name = "" # The top level name of the resource
self.instance_name = "" # The instance name
# Instance specific learned static class members.
self.entity_id = "" # fm entity id host=<hostname>.<instance>
self.instance = "" # <plugin>_<instance>
# [ 'float value string','float threshold string]
self.values = []
self.value = float(0) # float value of reading
# This member is used to help log change values using the
# LOG_STEP threshold consant
self.last_value = float(0)
# float value of threshold
self.threshold = float(INVALID_THRESHOLD)
# Common static class members.
self.reason_warning = ""
self.reason_failure = ""
self.repair = ""
self.alarm_type = fm_constants.FM_ALARM_TYPE_7 # OPERATIONAL
self.cause = fm_constants.ALARM_PROBABLE_CAUSE_50 # THRESHOLD CROSS
self.suppression = True
self.service_affecting = False
# default most reading types are usage
self.reading_type = READING_TYPE__PERCENT_USAGE
# Severity tracking lists.
# Maintains severity state between notifications.
# Each is a list of entity ids for severity asserted alarms.
# As alarms are cleared so is the entry in these lists.
# The entity id should only be in one lists for any given raised alarm.
self.warnings = []
self.failures = []
# total notification count
self.count = 0
# Debug: state audit controls
self.audit_threshold = 0
self.audit_count = 0
# For plugins that have multiple instances like df (filesystem plugin)
# we need to create an instance of this object for each one.
# This dictionary is used to associate an instance with its object.
self.instance_objects = {}
def _ilog(self, string):
"""Create a collectd notifier info log with the string param"""
collectd.info('%s %s : %s' % (PLUGIN, self.plugin, string))
def _llog(self, string):
"""Create a collectd notifier info log with the string param if debug_lists"""
if debug_lists:
collectd.info('%s %s : %s' % (PLUGIN, self.plugin, string))
def _elog(self, string):
"""Create a collectd notifier error log with the string param"""
collectd.error('%s %s : %s' % (PLUGIN, self.plugin, string))
##########################################################################
#
# Name : _state_audit
#
# Purpose : Debug Tool to log plugin object info.
#
# Not called in production code.
#
# Only the severity lists are dumped for now.
# Other info can be added as needed.
# Can be run as an audit or called directly.
#
##########################################################################
def _state_audit(self, location):
"""Log the state of the specified object"""
if self.id == ALARM_ID__CPU:
_print_state()
self.audit_count += 1
if self.warnings:
collectd.info("%s AUDIT %d: %s warning list %s:%s" %
(PLUGIN,
self.audit_count,
self.plugin,
location,
self.warnings))
if self.failures:
collectd.info("%s AUDIT %d: %s failure list %s:%s" %
(PLUGIN,
self.audit_count,
self.plugin,
location,
self.failures))
##########################################################################
#
# Name : _manage_change
#
# Purpose : Manage sample value change.
#
# Handle no sample update case.
# Parse the notification log.
# Handle base object instances.
# Generate a log entry if the sample value changes more than
# step value.
#
##########################################################################
def _manage_change(self, nObject):
"""Log resource instance value on step state change"""
# filter out messages to ignore ; notifications that have no value
if "has not been updated for" in nObject.message:
collectd.info("%s %s %s (%s)" %
(PLUGIN,
self.entity_id,
nObject.message,
nObject.severity))
return "done"
# Get the value from the notification message.
# The location in the message is different based on the message type ;
# normal reading or overage reading
#
# message: Host controller-0, plugin memory type percent ... [snip]
# All data sources are within range again.
# Current value of "value" is 51.412038. <------
#
# message: Host controller-0, plugin df (instance scratch) ... [snip]
# Data source "value" is currently 97.464027. <------
# That is above the failure threshold of 90.000000. <------
# recognized strings - value only value and threshold
# ------------ -------------------
value_sig_list = ['Current value of', 'is currently']
# list of parsed 'string version' float values ['value','threshold']
self.values = []
for sig in value_sig_list:
index = nObject.message.find(sig)
if index != -1:
self.values = \
re.findall(r"[-+]?\d*\.\d+|\d+", nObject.message[index:-1])
# contains string versions of the float values extracted from
# the notification message. The threshold value is included for
# readings that are out of threshold.
if len(self.values):
# validate the reading
try:
self.value = round(float(self.values[0]), 2)
# get the threshold if its there.
if len(self.values) > 1:
self.threshold = float(self.values[1])
else:
self.threshold = float(INVALID_THRESHOLD) # invalid value
except ValueError as ex:
collectd.error("%s %s value not integer or float (%s) (%s)" %
(PLUGIN, self.entity_id, self.value, str(ex)))
return "done"
except TypeError as ex:
collectd.info("%s %s value has no type (%s)" %
(PLUGIN, self.entity_id, str(ex)))
return "done"
else:
collectd.info("%s %s reported no value (%s)" %
(PLUGIN, self.entity_id, nObject.message))
return "done"
# get the last reading
if self.last_value:
last = float(self.last_value)
else:
last = float(0)
# Determine if the change is large enough to log and save the new value
logit = False
if self.count == 0 or LOG_STEP == 0:
logit = True
elif self.reading_type == "connections":
if self.value != last:
logit = True
elif self.value > last:
if (last + LOG_STEP) < self.value:
logit = True
elif last > self.value:
if (self.value + LOG_STEP) < last:
logit = True
# Case on types.
#
# Note: only usage type so far
if logit:
resource = self.resource_name
# setup resource name for filesystem instance usage log
if self.plugin == PLUGIN__DF:
resource = self.instance
elif self.plugin == PLUGIN__MEM:
if self.instance_name:
if self.instance_name != 'platform':
resource += ' ' + self.instance_name
# setup resource name for vswitch process instance name
elif self.plugin == PLUGIN__VSWITCH_MEM:
resource += ' Processor '
resource += self.instance_name
if self.reading_type == READING_TYPE__PERCENT_USAGE:
tmp = str(self.value).split('.')
if len(tmp[0]) == 1:
pre = ': '
else:
pre = ': '
collectd.info("%s reading%s%2.2f %s - %s" %
(PLUGIN,
pre,
self.value,
self.reading_type,
resource))
elif self.reading_type == "connections" and \
self.instance_objects and \
self.value != self.last_value:
if self.instance_objects:
collectd.info("%s monitor: %2d %s - %s" %
(PLUGIN,
self.value,
self.reading_type,
resource))
##########################################################################
#
# Name : _update_alarm
#
# Purpose : Compare current severity to instance severity lists to
# facilitate early 'do nothing' exit from a notification.
#
# Description: Avoid clearing an already cleared alarm.
# Refresh asserted alarm data for usage reading type alarms
#
# Returns : True if the alarm needs refresh, otherwise false.
#
##########################################################################
def _update_alarm(self, entity_id, severity, this_value, last_value):
"""Check for need to update alarm data"""
if entity_id in self.warnings:
self._llog(entity_id + " is already in warnings list")
current_severity_str = "warning"
elif entity_id in self.failures:
self._llog(entity_id + " is already in failures list")
current_severity_str = "failure"
else:
self._llog(entity_id + " is already OK")
current_severity_str = "okay"
# Compare to current state to previous state.
# If they are the same then return done.
if severity == current_severity_str:
if severity == "okay":
return False
if self.reading_type != READING_TYPE__PERCENT_USAGE:
return False
elif round(last_value, 2) == round(this_value, 2):
return False
return True
########################################################################
#
# Name : _manage_alarm
#
# Putpose : Alarm Severity Tracking
#
# This class member function accepts a severity level and entity id.
# It manages the content of the current alarm object's 'failures' and
# 'warnings' lists ; aka Severity Lists.
#
# These Severity Lists are used to record current alarmed state for
# each instance of a plugin.
# If an alarm is raised then its entity id is added to the appropriate
# severity list.
#
# A failure notification or critical alarm goes in the failures list.
# A warning notification or major alarm goes into the warnings list.
#
# These lists are used to avoid making unnecessary calls to FM.
#
# Startup Behavior:
#
# The collectd daemon runs the init function of every plugin on startup.
# That includes this notifier plugin. The init function queries the FM
# database for any active alarms.
#
# This member function is called for any active alarms that are found.
# The entity id for active alarms is added to the appropriate
# Severity List. This way existing alarms are maintained over collectd
# process startup.
#
# Runtime Behavior:
#
# The current severity state is first queried and compared to the
# newly reported severity level. If they are the same then a "done"
# is returned telling the caller that there is no further work to do.
# Otherwise, the lists are managed in a way that has the entity id
# of a raised alarm in the corresponding severity list.
#
# See inline comments below for each specific severity and state
# transition case.
#
#########################################################################
def _manage_alarm(self, entity_id, severity):
"""Manage the alarm severity lists and report state change"""
collectd.debug("%s manage alarm %s %s %s" %
(PLUGIN,
self.id,
severity,
entity_id))
# Get the instance's current state
if entity_id in self.warnings:
current_severity_str = "warning"
elif entity_id in self.failures:
current_severity_str = "failure"
else:
current_severity_str = "okay"
# Compare to current state to previous state.
# If they are the same then return done.
if severity == current_severity_str:
return "done"
# Otherwise, manage the severity lists ; case by case.
warnings_list_change = False
failures_list_change = False
# Case 1: Handle warning to failure severity change.
if severity == "warning" and current_severity_str == "failure":
if entity_id in self.failures:
self.failures.remove(entity_id)
failures_list_change = True
self._llog(entity_id + " is removed from failures list")
else:
self._elog(entity_id + " UNEXPECTEDLY not in failures list")
# Error detection
if entity_id in self.warnings:
self.warnings.remove(entity_id)
self._elog(entity_id + " UNEXPECTEDLY in warnings list")
self.warnings.append(entity_id)
warnings_list_change = True
self._llog(entity_id + " is added to warnings list")
# Case 2: Handle failure to warning alarm severity change.
elif severity == "failure" and current_severity_str == "warning":
if entity_id in self.warnings:
self.warnings.remove(entity_id)
warnings_list_change = True
self._llog(entity_id + " is removed from warnings list")
else:
self._elog(entity_id + " UNEXPECTEDLY not in warnings list")
# Error detection
if entity_id in self.failures:
self.failures.remove(entity_id)
self._elog(entity_id + " UNEXPECTEDLY in failures list")
self.failures.append(entity_id)
failures_list_change = True
self._llog(entity_id + " is added to failures list")
# Case 3: Handle new alarm.
elif severity != "okay" and current_severity_str == "okay":
if severity == "warning":
self.warnings.append(entity_id)
warnings_list_change = True
self._llog(entity_id + " added to warnings list")
elif severity == "failure":
self.failures.append(entity_id)
failures_list_change = True
self._llog(entity_id + " added to failures list")
# Case 4: Handle alarm clear.
else:
# plugin is okay, ensure this plugin's entity id
# is not in either list
if entity_id in self.warnings:
self.warnings.remove(entity_id)
warnings_list_change = True
self._llog(entity_id + " removed from warnings list")
if entity_id in self.failures:
self.failures.remove(entity_id)
failures_list_change = True
self._llog(entity_id + " removed from failures list")
if warnings_list_change is True:
if self.warnings:
collectd.info("%s %s warnings %s" %
(PLUGIN, self.plugin, self.warnings))
else:
collectd.info("%s %s no warnings" %
(PLUGIN, self.plugin))
if failures_list_change is True:
if self.failures:
collectd.info("%s %s failures %s" %
(PLUGIN, self.plugin, self.failures))
else:
collectd.info("%s %s no failures" %
(PLUGIN, self.plugin))
##########################################################################
#
# Name : _get_instance_object
#
# Purpose : Safely get an object from the self instance object list
# indexed by eid.
#
##########################################################################
def _get_instance_object(self, eid):
"""Safely get an object from the self instance object dict while locked
:param eid: the index for the instance object dictionary
:return: object or None
"""
try:
collectd.debug("%s %s Get Lock ..." % (PLUGIN, self.plugin))
with PluginObject.lock:
obj = self.instance_objects[eid]
return obj
except:
collectd.error("%s failed to get instance from %s object list" %
(PLUGIN, self.plugin))
return None
##########################################################################
#
# Name : _add_instance_object
#
# Purpose : Safely add an object to the self instance object list
# indexed by eid while locked. if found locked the instance
# add will be re-attempted on next sample.
#
##########################################################################
def _add_instance_object(self, obj, eid):
"""Update self instance_objects list while locked
:param obj: the object to add
:param eid: index for instance_objects
:return: nothing
"""
try:
collectd.debug("%s %s Add Lock ..." % (PLUGIN, self.plugin))
with PluginObject.lock:
self.instance_objects[eid] = obj
except:
collectd.error("%s failed to add instance to %s object list" %
(PLUGIN, self.plugin))
##########################################################################
#
# Name : _copy_instance_object
#
# Purpose : Copy select members of self object to target object.
#
##########################################################################
def _copy_instance_object(self, object):
"""Copy select members of self object to target object"""
object.resource_name = self.resource_name
object.instance_name = self.instance_name
object.reading_type = self.reading_type
object.reason_warning = self.reason_warning
object.reason_failure = self.reason_failure
object.repair = self.repair
object.alarm_type = self.alarm_type
object.cause = self.cause
object.suppression = self.suppression
object.service_affecting = self.service_affecting
##########################################################################
#
# Name : _create_instance_object
#
# Purpose : Create a new instance object and tack it on the supplied base
# object's instance object dictionary.
#
##########################################################################
def _create_instance_object(self, instance):
try:
# create a new plugin object
inst_obj = PluginObject(self.id, self.plugin)
self._copy_instance_object(inst_obj)
# initialize the object with instance specific data
inst_obj.instance_name = instance
inst_obj.entity_id = _build_entity_id(self.plugin,
instance)
self._add_instance_object(inst_obj, inst_obj.entity_id)
collectd.debug("%s created %s instance (%s) object %s" %
(PLUGIN, inst_obj.resource_name,
inst_obj.entity_id, inst_obj))
collectd.info("%s monitoring %s %s %s" %
(PLUGIN,
inst_obj.resource_name,
inst_obj.instance_name,
inst_obj.reading_type))
return inst_obj
except:
collectd.error("%s %s:%s inst object create failed" %
(PLUGIN, inst_obj.resource_name, instance))
return None
##########################################################################
#
# Name : _create_instance_objects
#
# Purpose : Create a list of instance objects for 'self' type plugin and
# add those objects to the parent's instance_objects dictionary.
#
# Note : This is currently only used for the DF (filesystem) plugin.
# All other instance creations/allocations are done on-demand.
#
##########################################################################
def _create_instance_objects(self):
"""Create, initialize and add an instance object to this/self plugin"""
# Create the File System subordinate instance objects.
if self.id == ALARM_ID__DF:
# read the df.conf file and return/get a list of mount points
conf_file = PLUGIN_PATH + 'df.conf'
if not os.path.exists(conf_file):
collectd.error("%s cannot create filesystem "
"instance objects ; missing : %s" %
(PLUGIN, conf_file))
return FAIL
mountpoints = []
with open(conf_file, 'r') as infile:
for line in infile:
if 'MountPoint ' in line:
# get the mountpoint path from the line
try:
mountpoint = line.split('MountPoint ')[1][1:-2]
mountpoints.append(mountpoint)
except:
collectd.error("%s skipping invalid '%s' "
"mountpoint line: %s" %
(PLUGIN, conf_file, line))
collectd.debug("%s MountPoints: %s" % (PLUGIN, mountpoints))
# loop over the mount points
for mp in mountpoints:
# create a new plugin object
inst_obj = PluginObject(ALARM_ID__DF, PLUGIN__DF)
# initialize the object with instance specific data
inst_obj.resource_name = self.resource_name
inst_obj.instance_name = mp
inst_obj.instance = mp
# build the plugin instance name from the mount point
if mp == '/':
inst_obj.plugin_instance = 'root'
else:
inst_obj.plugin_instance = mp[1:].replace('/', '-')
inst_obj.entity_id = _build_entity_id(PLUGIN__DF,
inst_obj.plugin_instance)
# add this subordinate object to the parent's
# instance object list
self._add_instance_object(inst_obj, inst_obj.entity_id)
collectd.info("%s monitoring %s usage" %
(PLUGIN, inst_obj.instance))
PluginObject.host = os.uname()[1]
# ADD_NEW_PLUGIN: add plugin to this table
# This instantiates the plugin objects
PLUGINS = {
PLUGIN__CPU: PluginObject(ALARM_ID__CPU, PLUGIN__CPU),
PLUGIN__MEM: PluginObject(ALARM_ID__MEM, PLUGIN__MEM),
PLUGIN__DF: PluginObject(ALARM_ID__DF, PLUGIN__DF),
PLUGIN__VSWITCH_CPU: PluginObject(ALARM_ID__VSWITCH_CPU,
PLUGIN__VSWITCH_CPU),
PLUGIN__VSWITCH_MEM: PluginObject(ALARM_ID__VSWITCH_MEM,
PLUGIN__VSWITCH_MEM),
PLUGIN__VSWITCH_PORT: PluginObject(ALARM_ID__VSWITCH_PORT,
PLUGIN__VSWITCH_PORT),
PLUGIN__VSWITCH_IFACE: PluginObject(ALARM_ID__VSWITCH_IFACE,
PLUGIN__VSWITCH_IFACE),
PLUGIN__EXAMPLE: PluginObject(ALARM_ID__EXAMPLE, PLUGIN__EXAMPLE)}
#####################################################################
#
# Name : clear_alarm
#
# Description: Clear the specified alarm with the specified entity ID.
#
# Returns : True if operation succeeded
# False if there was an error exception.
#
# Assumptions: Caller can decide to retry based on return status.
#
#####################################################################
def clear_alarm(alarm_id, eid):
"""Clear the specified alarm:eid"""
try:
if api.clear_fault(alarm_id, eid) is True:
collectd.info("%s %s:%s alarm cleared" %
(PLUGIN, alarm_id, eid))
else:
collectd.info("%s %s:%s alarm already cleared" %
(PLUGIN, alarm_id, eid))
return True
except Exception as ex:
collectd.error("%s 'clear_fault' exception ; %s:%s ; %s" %
(PLUGIN, alarm_id, eid, ex))
return False
def _get_base_object(alarm_id):
"""Get the alarm object for the specified alarm id"""
for plugin in PLUGIN_NAME_LIST:
if PLUGINS[plugin].id == alarm_id:
return PLUGINS[plugin]
return None
def _get_object(alarm_id, eid):
"""Get the plugin object for the specified alarm id and eid"""
base_obj = _get_base_object(alarm_id)
if len(base_obj.instance_objects):
try:
return(base_obj.instance_objects[eid])
except:
collectd.debug("%s %s has no instance objects" %
(PLUGIN, base_obj.plugin))
return base_obj
def _build_entity_id(plugin, plugin_instance):
"""Builds an entity id string based on the collectd notification object"""
inst_error = False
entity_id = 'host='
entity_id += PluginObject.host
if plugin == PLUGIN__MEM:
if plugin_instance != 'platform':
entity_id += '.numa=' + plugin_instance
elif plugin == PLUGIN__VSWITCH_MEM:
# host=<hostname>.processor=<socket-id>
if plugin_instance:
entity_id += '.processor=' + plugin_instance
else:
inst_error = True
elif plugin == PLUGIN__VSWITCH_IFACE:
# host=<hostname>.interface=<if-uuid>
if plugin_instance:
entity_id += '.interface=' + plugin_instance
else:
inst_error = True
elif plugin == PLUGIN__VSWITCH_PORT:
# host=<hostname>.port=<port-uuid>
if plugin_instance:
entity_id += '.port=' + plugin_instance
else:
inst_error = True
elif plugin == PLUGIN__DF:
# host=<hostname>.filesystem=<mountpoint>
if plugin_instance:
instance = plugin_instance
# build the entity_id for this plugin
entity_id += '.filesystem=/'
# collectd replaces the instance '/' with the word 'root'
# So skip over "root" as '/' is already part of the
# entity_id
if instance != 'root':
# Look for other instances that are in the mangled list
if instance in mangled_list:
instance = instance.replace('-', '/')
entity_id += instance
if inst_error is True:
collectd.error("%s eid build failed ; missing instance" % plugin)
return None
return entity_id
def _get_df_mountpoints():
conf_file = PLUGIN_PATH + 'df.conf'
if not os.path.exists(conf_file):
collectd.error("%s cannot create filesystem "
"instance objects ; missing : %s" %
(PLUGIN, conf_file))
return FAIL
mountpoints = []
with open(conf_file, 'r') as infile:
for line in infile:
if 'MountPoint ' in line:
# get the mountpoint path from the line
try:
mountpoint = line.split('MountPoint ')[1][1:-2]
mountpoints.append(mountpoint)
except:
collectd.error("%s skipping invalid '%s' "
"mountpoint line: %s" %
(PLUGIN, conf_file, line))
return(mountpoints)
def _print_obj(obj):
"""Print a single object"""
base_object = False
for plugin in PLUGIN_NAME_LIST:
if PLUGINS[plugin] == obj:
base_object = True
break
num = len(obj.instance_objects)
if num > 0 or base_object is True:
prefix = "PLUGIN "
if num:
prefix += str(num)
else:
prefix += " "
else:
prefix = "INSTANCE"
if obj.plugin_instance:
resource = obj.plugin + ":" + obj.plugin_instance
else:
resource = obj.plugin
collectd.info("%s %s res: %s name: %s\n" %
(PLUGIN, prefix, resource, obj.resource_name))
collectd.info("%s eid : %s\n" % (PLUGIN, obj.entity_id))
collectd.info("%s inst: %s name: %s\n" %
(PLUGIN, obj.instance, obj.instance_name))
collectd.info("%s value:%2.1f thld:%2.1f cause:%s (%d) type:%s" %
(PLUGIN,
obj.value,
obj.threshold,
obj.cause,
obj.count,
obj.reading_type))
collectd.info("%s warn:%s fail:%s" %
(PLUGIN, obj.warnings, obj.failures))
collectd.info("%s repair:t: %s" %
(PLUGIN, obj.repair))
if obj.cause != fm_constants.ALARM_PROBABLE_CAUSE_50:
collectd.info("%s reason:w: %s\n"
"%s reason:f: %s\n" %
(PLUGIN, obj.reason_warning,
PLUGIN, obj.reason_failure))
# collectd.info(" ")
def _print_state(obj=None):
"""Print the current object state"""
try:
objs = []
if obj is None:
for plugin in PLUGIN_NAME_LIST:
objs.append(PLUGINS[plugin])
else:
objs.append(obj)
collectd.debug("%s _print_state Lock ..." % PLUGIN)
with PluginObject.lock:
for o in objs:
_print_obj(o)
if len(o.instance_objects):
for inst_obj in o.instance_objects:
_print_obj(o.instance_objects[inst_obj])
except Exception as ex:
collectd.error("%s _print_state exception ; %s" %
(PLUGIN, ex))
def _database_setup(database):
"""Setup the influx database for collectd resource samples"""
collectd.info("%s setting up influxdb:%s database" %
(PLUGIN, database))
error_str = ""
# http://influxdb-python.readthedocs.io/en/latest/examples.html
# http://influxdb-python.readthedocs.io/en/latest/api-documentation.html
PluginObject.dbObj = InfluxDBClient('127.0.0.1', '8086', database)
if PluginObject.dbObj:
try:
PluginObject.dbObj.create_database('collectd')
############################################################
#
# TODO: Read current retention period from service parameter
# Make it a puppet implementation.
#
# Create a 1 month samples retention policy
# -----------------------------------------
# name = 'collectd samples'
# duration = set retention period in time
# xm - minutes
# xh - hours
# xd - days
# xw - weeks
# xy - years
# database = 'collectd'
# default = True ; make it the default
#
############################################################
PluginObject.dbObj.create_retention_policy(
DATABASE_NAME, '4w', 1, database, True)
except Exception as ex:
if str(ex) == 'database already exists':
try:
collectd.info("%s influxdb:collectd %s" %
(PLUGIN, str(ex)))
PluginObject.dbObj.create_retention_policy(
DATABASE_NAME, '4w', 1, database, True)
except Exception as ex:
if str(ex) == 'retention policy already exists':
collectd.info("%s influxdb:collectd %s" %
(PLUGIN, str(ex)))
else:
error_str = "failure from influxdb ; "
error_str += str(ex)
else:
error_str = "failed to create influxdb:" + database
else:
error_str = "failed to connect to influxdb:" + database
if not error_str:
found = False
retention = \
PluginObject.dbObj.get_list_retention_policies(database)
for r in range(len(retention)):
if retention[r]["name"] == DATABASE_NAME:
collectd.info("%s influxdb:%s samples retention "
"policy: %s" %
(PLUGIN, database, retention[r]))
found = True
if found is True:
collectd.info("%s influxdb:%s is setup" % (PLUGIN, database))
PluginObject.database_setup = True
else:
collectd.error("%s influxdb:%s retention policy NOT setup" %
(PLUGIN, database))
def _clear_alarm_for_missing_filesystems():
"""Clear alarmed file systems that are no longer mounted or present"""
# get the DF (filesystem plugin) base object.
df_base_obj = PLUGINS[PLUGIN__DF]
# create a single alarm list from both wranings and failures list
# to avoid having to duplicate the code below for each.
# At this point we don't care about severity, we just need to
# determine if an any-severity' alarmed filesystem no longer exists
# so we can cleanup by clearing its alarm.
# Note: the 2 lists shpould always contain unique data between them
alarm_list = df_base_obj.warnings + df_base_obj.failures
if len(alarm_list):
for eid in alarm_list:
# search for any of them that might be alarmed.
obj = df_base_obj._get_instance_object(eid)
# only care about df (file system plugins)
if obj is not None and \
obj.plugin == PLUGIN__DF and \
obj.entity_id == eid and \
obj.plugin_instance != 'root':
# For all others replace all '-' with '/'
path = '/' + obj.plugin_instance.replace('-', '/')
if os.path.ismount(path) is False:
if clear_alarm(df_base_obj.id, obj.entity_id) is True:
collectd.info("%s cleared alarm for missing %s" %
(PLUGIN, path))
df_base_obj._manage_alarm(obj.entity_id, "okay")
else:
collectd.debug("%s maintaining alarm for %s" %
(PLUGIN, path))
# Collectd calls this function on startup.
# Initialize each plugin object with plugin specific data.
# Query FM for existing alarms and run with that starting state.
def init_func():
"""Collectd FM Notifier Initialization Function"""
PluginObject.lock = Lock()
PluginObject.host = os.uname()[1]
collectd.info("%s %s:%s init function" %
(PLUGIN, tsc.nodetype, PluginObject.host))
# Constant CPU Plugin Object Settings
obj = PLUGINS[PLUGIN__CPU]
obj.resource_name = "Platform CPU"
obj.instance_name = PLUGIN__CPU
obj.repair = "Monitor and if condition persists, "
obj.repair += "contact next level of support."
collectd.info("%s monitoring %s usage" % (PLUGIN, obj.resource_name))
###########################################################################
# Constant Memory Plugin Object settings
obj = PLUGINS[PLUGIN__MEM]
obj.resource_name = "Platform Memory"
obj.instance_name = PLUGIN__MEM
obj.repair = "Monitor and if condition persists, "
obj.repair += "contact next level of support; "
obj.repair += "may require additional memory on Host."
collectd.info("%s monitoring %s usage" % (PLUGIN, obj.resource_name))
###########################################################################
# Constant FileSystem Plugin Object settings
obj = PLUGINS[PLUGIN__DF]
obj.resource_name = "File System"
obj.instance_name = PLUGIN__DF
obj.repair = "Monitor and if condition persists, "
obj.repair += "contact next level of support."
# The FileSystem (DF) plugin has multiple instances
# One instance per file system mount point being monitored.
# Create one DF instance object per mount point
obj._create_instance_objects()
# ntp query is for controllers only
if want_vswitch is False:
collectd.debug("%s vSwitch monitoring disabled" % PLUGIN)
elif tsc.nodetype == 'worker' or 'worker' in tsc.subfunctions:
#######################################################################
# Constant vSwitch CPU Usage Plugin Object settings
obj = PLUGINS[PLUGIN__VSWITCH_CPU]
obj.resource_name = "vSwitch CPU"
obj.instance_name = PLUGIN__VSWITCH_CPU
obj.repair = "Monitor and if condition persists, "
obj.repair += "contact next level of support."
collectd.info("%s monitoring %s usage" % (PLUGIN, obj.resource_name))
#######################################################################
# Constant vSwitch Memory Usage Plugin Object settings
obj = PLUGINS[PLUGIN__VSWITCH_MEM]
obj.resource_name = "vSwitch Memory"
obj.instance_name = PLUGIN__VSWITCH_MEM
obj.repair = "Monitor and if condition persists, "
obj.repair += "contact next level of support."
collectd.info("%s monitoring %s usage" % (PLUGIN, obj.resource_name))
#######################################################################
# Constant vSwitch Port State Monitor Plugin Object settings
obj = PLUGINS[PLUGIN__VSWITCH_PORT]
obj.resource_name = "vSwitch Port"
obj.instance_name = PLUGIN__VSWITCH_PORT
obj.reading_type = "state"
obj.reason_failure = "'Data' Port failed."
obj.reason_warning = "'Data' Port failed."
obj.repair = "Check cabling and far-end port configuration and "
obj.repair += "status on adjacent equipment."
obj.alarm_type = fm_constants.FM_ALARM_TYPE_4 # EQUIPMENT
obj.cause = fm_constants.ALARM_PROBABLE_CAUSE_29 # LOSS_OF_SIGNAL
obj.service_affecting = True
collectd.info("%s monitoring %s state" % (PLUGIN, obj.resource_name))
#######################################################################
# Constant vSwitch Interface State Monitor Plugin Object settings
obj = PLUGINS[PLUGIN__VSWITCH_IFACE]
obj.resource_name = "vSwitch Interface"
obj.instance_name = PLUGIN__VSWITCH_IFACE
obj.reading_type = "state"
obj.reason_failure = "'Data' Interface failed."
obj.reason_warning = "'Data' Interface degraded."
obj.repair = "Check cabling and far-end port configuration and "
obj.repair += "status on adjacent equipment."
obj.alarm_type = fm_constants.FM_ALARM_TYPE_4 # EQUIPMENT
obj.cause = fm_constants.ALARM_PROBABLE_CAUSE_29 # LOSS_OF_SIGNAL
obj.service_affecting = True
collectd.info("%s monitoring %s state" % (PLUGIN, obj.resource_name))
###########################################################################
obj = PLUGINS[PLUGIN__EXAMPLE]
obj.resource_name = "Example"
obj.instance_name = PLUGIN__EXAMPLE
obj.repair = "Not Applicable"
collectd.info("%s monitoring %s usage" % (PLUGIN, obj.resource_name))
# ...
# ADD_NEW_PLUGIN: Add new plugin object initialization here ...
# ...
if tsc.nodetype == 'controller':
PluginObject.database_setup_in_progress = True
_database_setup('collectd')
PluginObject.database_setup_in_progress = False
# The notifier function inspects the collectd notification and determines if
# the representative alarm needs to be asserted, severity changed, or cleared.
def notifier_func(nObject):
if PluginObject.fm_connectivity is False:
# handle multi threading startup
with PluginObject.lock:
if PluginObject.fm_connectivity is True:
return 0
##################################################################
#
# With plugin objects initialized ...
# Query FM for any resource alarms that may already be raised
# Load the queries severity state into the appropriate
# severity list for those that are.
for alarm_id in ALARM_ID_LIST:
collectd.debug("%s searching for all '%s' alarms " %
(PLUGIN, alarm_id))
try:
alarms = api.get_faults_by_id(alarm_id)
except Exception as ex:
collectd.error("%s 'get_faults_by_id' exception ; %s" %
(PLUGIN, ex))
return 0
if alarms:
for alarm in alarms:
want_alarm_clear = False
eid = alarm.entity_instance_id
# ignore alarms not for this host
if PluginObject.host not in eid:
continue
base_obj = _get_base_object(alarm_id)
if base_obj is None:
# might be a plugin instance - clear it
want_alarm_clear = True
collectd.info('%s found %s %s alarm [%s]' %
(PLUGIN,
alarm.severity,
alarm_id,
eid))
if want_alarm_clear is True:
if clear_alarm(alarm_id, eid) is False:
collectd.error("%s %s:%s clear failed" %
(PLUGIN,
alarm_id,
eid))
else:
collectd.info("%s clear %s %s alarm %s" %
(PLUGIN,
alarm.severity,
alarm_id,
eid))
continue
if alarm.severity == "critical":
sev = "failure"
elif alarm.severity == "major":
sev = "warning"
else:
sev = "okay"
continue
# Load the alarm severity by plugin/instance lookup.
if base_obj is not None:
base_obj._manage_alarm(eid, sev)
PluginObject.fm_connectivity = True
collectd.info("%s initialization complete" % PLUGIN)
collectd.debug('%s notification: %s %s:%s - %s %s %s [%s]' % (
PLUGIN,
nObject.host,
nObject.plugin,
nObject.plugin_instance,
nObject.type,
nObject.type_instance,
nObject.severity,
nObject.message))
# Load up severity variables and alarm actions based on
# this notification's severity level.
if nObject.severity == NOTIF_OKAY:
severity_str = "okay"
_severity_num = fm_constants.FM_ALARM_SEVERITY_CLEAR
_alarm_state = fm_constants.FM_ALARM_STATE_CLEAR
elif nObject.severity == NOTIF_FAILURE:
severity_str = "failure"
_severity_num = fm_constants.FM_ALARM_SEVERITY_CRITICAL
_alarm_state = fm_constants.FM_ALARM_STATE_SET
elif nObject.severity == NOTIF_WARNING:
severity_str = "warning"
_severity_num = fm_constants.FM_ALARM_SEVERITY_MAJOR
_alarm_state = fm_constants.FM_ALARM_STATE_SET
else:
collectd.debug('%s with unsupported severity %d' %
(PLUGIN, nObject.severity))
return 0
if tsc.nodetype == 'controller':
if PluginObject.database_setup is False:
if PluginObject.database_setup_in_progress is False:
PluginObject.database_setup_in_progress = True
_database_setup('collectd')
PluginObject.database_setup_in_progress = False
# get plugin object
if nObject.plugin in PLUGINS:
base_obj = obj = PLUGINS[nObject.plugin]
# if this notification is for a plugin instance then get that
# instances's object instead.
# If that object does not yet exists then create it.
eid = ''
# DF instances are statically allocated
if nObject.plugin == PLUGIN__DF:
eid = _build_entity_id(nObject.plugin, nObject.plugin_instance)
# get this instances object
obj = base_obj._get_instance_object(eid)
if obj is None:
# path should never be hit since all DF instances
# are statically allocated.
return 0
elif nObject.plugin_instance:
need_instance_object_create = False
# Build the entity_id from the parent object if needed
# Build the entity_id from the parent object if needed
eid = _build_entity_id(nObject.plugin, nObject.plugin_instance)
try:
# Need lock when reading/writing any obj.instance_objects list
with PluginObject.lock:
# we will take an exception if this object is not
# in the list. The exception handling code below will
# create and add this object for success path the next
# time around.
inst_obj = base_obj.instance_objects[eid]
collectd.debug("%s %s instance %s already exists %s" %
(PLUGIN, nObject.plugin, eid, inst_obj))
# _print_state(inst_obj)
except:
need_instance_object_create = True
if need_instance_object_create is True:
base_obj._create_instance_object(nObject.plugin_instance)
inst_obj = base_obj._get_instance_object(eid)
if inst_obj:
collectd.debug("%s %s:%s inst object created" %
(PLUGIN,
inst_obj.plugin,
inst_obj.instance))
else:
collectd.error("%s %s:%s inst object create failed" %
(PLUGIN,
nObject.plugin,
nObject.plugin_instance))
return 0
# re-assign the object
obj = inst_obj
else:
if not len(base_obj.entity_id):
# Build the entity_id from the parent object if needed
eid = _build_entity_id(nObject.plugin, nObject.plugin_instance)
# update the object with the eid if its not already set.
if not len(obj.entity_id):
obj.entity_id = eid
else:
collectd.debug("%s notification for unknown plugin: %s %s" %
(PLUGIN, nObject.plugin, nObject.plugin_instance))
return 0
# if obj.warnings or obj.failures:
# _print_state(obj)
# If want_state_audit is True then run the audit.
# Primarily used for debug
# default state is False
# TODO: comment out for production code.
if want_state_audit:
obj.audit_threshold += 1
if obj.audit_threshold == DEBUG_AUDIT:
obj.audit_threshold = 0
obj._state_audit("audit")
# manage reading value change ; store last and log if gt obj.step
action = obj._manage_change(nObject)
if action == "done":
return 0
# increment just before any possible return for a valid sample
obj.count += 1
# audit file system presence every time we get the
# notification for the root file system ; which will
# always be there.
if obj.instance == '/':
_clear_alarm_for_missing_filesystems()
# exit early if there is no alarm update to be made
if base_obj._update_alarm(obj.entity_id,
severity_str,
obj.value,
obj.last_value) is False:
return 0
obj.last_value = round(obj.value, 2)
if _alarm_state == fm_constants.FM_ALARM_STATE_CLEAR:
if clear_alarm(obj.id, obj.entity_id) is False:
return 0
else:
# manage addition of the failure reason text
if obj.cause == fm_constants.ALARM_PROBABLE_CAUSE_50:
# if this is a threshold alarm then build the reason text that
# includes the threshold and the reading that caused the assertion.
reason = obj.resource_name
reason += " threshold exceeded ;"
if obj.threshold != INVALID_THRESHOLD:
reason += " threshold {:2.2f}".format(obj.threshold) + "%,"
if obj.value:
reason += " actual {:2.2f}".format(obj.value) + "%"
elif _severity_num == fm_constants.FM_ALARM_SEVERITY_CRITICAL:
reason = obj.reason_failure
else:
reason = obj.reason_warning
# build the alarm object
fault = fm_api.Fault(
alarm_id=obj.id,
alarm_state=_alarm_state,
entity_type_id=fm_constants.FM_ENTITY_TYPE_HOST,
entity_instance_id=obj.entity_id,
severity=_severity_num,
reason_text=reason,
alarm_type=base_obj.alarm_type,
probable_cause=base_obj.cause,
proposed_repair_action=base_obj.repair,
service_affecting=base_obj.service_affecting,
suppression=base_obj.suppression)
try:
alarm_uuid = api.set_fault(fault)
if pc.is_uuid_like(alarm_uuid) is False:
collectd.error("%s 'set_fault' failed ; %s:%s ; %s" %
(PLUGIN,
base_obj.id,
obj.entity_id,
alarm_uuid))
return 0
except Exception as ex:
collectd.error("%s 'set_fault' exception ; %s:%s:%s ; %s" %
(PLUGIN,
obj.id,
obj.entity_id,
_severity_num,
ex))
return 0
# update the lists now that
base_obj._manage_alarm(obj.entity_id, severity_str)
collectd.info("%s %s alarm %s:%s %s:%s value:%2.2f" % (
PLUGIN,
_alarm_state,
base_obj.id,
severity_str,
obj.instance,
obj.entity_id,
obj.value))
# Debug only: comment out for production code.
# obj._state_audit("change")
return 0
collectd.register_init(init_func)
collectd.register_notification(notifier_func)