Salman Rana 7d44c38c90 Introduce dccertmon service
This commit introduces dccertmon, a new managed service for DC
certificate auditing and management.

Currently, platform cert management, DC cert management, and subcloud
cert auditing are coupled into a single platform service (certmon). To
meet the requirements of DC scalability and portability, DC specific
functionality must be decoupled. These changes lay the groundwork
for the new service, by:
- Creating the necessary service files.
- Introducing configs for the service.
- Declaring high level methods (Skeleton - lifecycle and manager)

DC-specific functionality will be migrated to this dccertmon service and
optimized in subsequent changes. Non-DC cert management will continue to
be handled by certmon.

Overall, this commit introduces:
- The OCF file necessary for high availability management of the
  dccertmon service by SM.
- Package configurations to build the service (Package: distributedcloud-dccertmon).
- Lifecycle manager for a running DC cert monitor service.
- Skeleton/base service application logic - CertificateMonitorManager.
- RPC notification handlers for subcloud online/managed.
- Configuration for the log folders and log rotation. The logs
  will be available in /var/log/dccertmon/dccertmon.log.

These changes are part of a set of commits to introduce the dccertmon service:
  [1] https://review.opendev.org/c/starlingx/ha/+/941205
  [2] https://review.opendev.org/c/starlingx/stx-puppet/+/941208

Test Plan:
  - PASS: Build dccertmon package
  - PASS: Install and bootstrap system with custom ISO containing the
          newly created dccertmon package
  - PASS: Verify that the dccertmon.service is loaded
  - PASS: Verify dccertmon is being properly logged to the correct
          folder.
  - PASS: Check logged messages and verify execution of
           - Cert Watcher thread
           - Task Executor (Audit thread)
           - Periodic tasks running at expected intervals
  - PASS: Configure and provision the service using SM and verify
          it has correctly started and can be restarted with
          'sm-restart'.
  - PASS: Tox checks running on dccertmon

  Note: This commit has been tested alongside the related changes and
        their respective test plans. [1][2]

Story: 2011311
Task: 51663

Change-Id: Ic23d8d13e4b292cf0508d23eaae99b8e07f36d31
Signed-off-by: Salman Rana <salman.rana@windriver.com>
2025-03-14 15:48:19 -04:00

84 lines
2.9 KiB
Python

#
# Copyright (c) 2025 Wind River Systems, Inc.
#
# SPDX-License-Identifier: Apache-2.0
#
from oslo_config import cfg
from oslo_log import log as logging
import oslo_messaging
from oslo_service import service
from dccertmon.common.certificate_monitor_manager import CertificateMonitorManager
from dccertmon.common import utils
from dcmanager.common import consts
from dcmanager.common import messaging as rpc_messaging
CONF = cfg.CONF
LOG = logging.getLogger(__name__)
class CertificateMonitorService(service.Service):
"""Lifecycle manager for a running DC cert monitor service."""
def __init__(self):
super(CertificateMonitorService, self).__init__()
self.rpc_api_version = consts.RPC_API_VERSION
self.topic = consts.TOPIC_DC_NOTIFICATION
# TODO(srana): Refactor DC role usage due to deprecation.
self.dc_role = utils.DC_ROLE_UNDETECTED
self.manager = CertificateMonitorManager()
self._rpc_server = None
self.target = None
def start(self):
LOG.info("Starting %s", self.__class__.__name__)
super(CertificateMonitorService, self).start()
self._get_dc_role()
self.manager.start_cert_watcher()
self.manager.start_task_executor()
if self.dc_role == utils.DC_ROLE_SYSTEMCONTROLLER:
self.target = oslo_messaging.Target(
version=self.rpc_api_version, server=CONF.host, topic=self.topic
)
self._rpc_server = rpc_messaging.get_rpc_server(self.target, self)
self._rpc_server.start()
def stop(self):
LOG.info("Stopping %s", self.__class__.__name__)
if self.dc_role == utils.DC_ROLE_SYSTEMCONTROLLER:
self._stop_rpc_server()
self.manager.stop_cert_watcher()
self.manager.stop_task_executor()
super(CertificateMonitorService, self).stop()
def _stop_rpc_server(self):
if self._rpc_server:
try:
self._rpc_server.stop()
self._rpc_server.wait()
LOG.info("Engine service stopped successfully")
except Exception as ex:
LOG.error("Failed to stop engine service: %s" % ex)
LOG.exception(ex)
def _get_dc_role(self):
# TODO(srana): Update after migrating from certmon
return utils.DC_ROLE_SYSTEMCONTROLLER
def subcloud_online(self, context, subcloud_name=None):
"""TODO(srana): Trigger a subcloud online audit"""
LOG.info("%s is online." % subcloud_name)
def subcloud_managed(self, context, subcloud_name=None):
"""TODO(srana): Trigger a subcloud audit"""
LOG.info("%s is managed." % subcloud_name)
def subcloud_sysinv_endpoint_update(self, ctxt, subcloud_name, endpoint):
"""TODO(srana): Update sysinv endpoint of dc token cache"""
LOG.info("Update subcloud: %s sysinv endpoint" % subcloud_name)