diff --git a/doc/source/admin/support-compute.rst b/doc/source/admin/support-compute.rst index 579d0d36eaf5..f5d571bf5694 100644 --- a/doc/source/admin/support-compute.rst +++ b/doc/source/admin/support-compute.rst @@ -15,6 +15,7 @@ you how to troubleshoot Compute. :maxdepth: 1 troubleshooting/orphaned-allocations.rst + troubleshooting/rebuild-placement-db.rst Compute service logging diff --git a/doc/source/admin/troubleshooting/rebuild-placement-db.rst b/doc/source/admin/troubleshooting/rebuild-placement-db.rst new file mode 100644 index 000000000000..cf877fe9aa4b --- /dev/null +++ b/doc/source/admin/troubleshooting/rebuild-placement-db.rst @@ -0,0 +1,56 @@ +Rebuild placement DB +==================== + +Problem +------- + +You have somehow changed a nova cell database and the ``compute_nodes`` table +entries are now reporting different uuids to the placement service but +placement already has ``resource_providers`` table entries with the same +names as those computes so the resource providers in placement and the +compute nodes in the nova database are not synchronized. Maybe this happens +as a result of restoring the nova cell database from a backup where the compute +hosts have not changed but they are using different uuids. + +Nova reports compute node inventory to placement using the +``hypervisor_hostname`` and uuid of the ``compute_nodes`` table to the +placement ``resource_providers`` table, which has a unique constraint on the +name (hostname in this case) and uuid. Trying to create a new resource provider +with a new uuid but the same name as an existing provider results in a 409 +error from placement, such as in `bug 1817833`_. + +.. _bug 1817833: https://bugs.launchpad.net/nova/+bug/1817833 + +Solution +-------- + +.. warning:: This is likely a last resort when *all* computes and resource + providers are not synchronized and it is simpler to just rebuild + the placement database from the current state of nova. This may, + however, not work when using placement for more advanced features + such as :neutron-doc:`ports with minimum bandwidth guarantees ` + or `accelerators `_. + Obviously testing first in a pre-production environment is ideal. + +These are the steps at a high level: + +#. Make a backup of the existing placement database in case these steps fail + and you need to start over. + +#. Recreate the placement database and run the schema migrations to + initialize the placement database. + +#. Either restart or wait for the + :oslo.config:option:`update_resources_interval` on the ``nova-compute`` + services to report resource providers and their inventory to placement. + +#. Run the :ref:`nova-manage placement heal_allocations ` + command to report allocations to placement for the existing instances in + nova. + +#. Run the :ref:`nova-manage placement sync_aggregates ` + command to synchronize nova host aggregates to placement resource provider + aggregates. + +Once complete, test your deployment as usual, e.g. running Tempest integration +and/or Rally tests, creating, migrating and deleting a server, etc. diff --git a/doc/source/cli/nova-manage.rst b/doc/source/cli/nova-manage.rst index 68b9eb51139a..08e1440e92cd 100644 --- a/doc/source/cli/nova-manage.rst +++ b/doc/source/cli/nova-manage.rst @@ -643,6 +643,8 @@ Placement * - 255 - An unexpected error occurred. +.. _sync_aggregates_cli: + ``nova-manage placement sync_aggregates [--verbose]`` Mirrors compute host aggregates to resource provider aggregates in the Placement service. Requires the :oslo.config:group:`api_database`