Add some more cellsv2 doc goodness

This adds a fresh cellsv2 overview document that talks about deployment decisions for single and multiple cell environments in an attempt to help address confusion about what the service layouts look like in a multi-cell setup. Change-Id: I1da7c375dbb98c125aebabec548280de8d8ed381
2017-07-25 12:13:42 -07:00 · 2017-07-25 12:13:42 -07:00 · 7c17010448
commit 7c17010448
parent 7234e6e474
3 changed files with 296 additions and 0 deletions
--- a/doc/source/index.rst
+++ b/doc/source/index.rst
@ -175,6 +175,7 @@ these are a great place to start reading up on the current plans.
   :maxdepth: 1

   user/cells
+   user/cellsv2_layout
   user/upgrade
   contributor/api
   contributor/microversions
--- a/doc/source/user/cells.rst
+++ b/doc/source/user/cells.rst
@ -11,6 +11,8 @@
      License for the specific language governing permissions and limitations
      under the License.

+.. _cells:
+
 =======
 Cells
 =======
--- a/doc/source/user/cellsv2_layout.rst
+++ b/doc/source/user/cellsv2_layout.rst
@ -0,0 +1,293 @@
+..
+      Licensed under the Apache License, Version 2.0 (the "License"); you may
+      not use this file except in compliance with the License. You may obtain
+      a copy of the License at
+
+          http://www.apache.org/licenses/LICENSE-2.0
+
+      Unless required by applicable law or agreed to in writing, software
+      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+      License for the specific language governing permissions and limitations
+      under the License.
+
+===================
+ Cells Layout (v2)
+===================
+
+This document describes the layout of a deployment with Cells
+version 2, including deployment considerations for security and
+scale. It is focused on code present in Pike and later, and while it
+is geared towards people who want to have multiple cells for whatever
+reason, the nature of the cellsv2 support in Nova means that it
+applies in some way to all deployments.
+
+.. note:: The concepts laid out in this document do not in any way
+          relate to CellsV1, which includes the ``nova-cells``
+          service, and the ``[cells]`` section of the configuration
+          file. For more information on the differences, see the main
+          :ref:`cells` page.
+
+Concepts
+========
+
+A basic Nova system consists of the following components:
+
+* The nova-api service which provides the external REST API to users.
+* The nova-scheduler and placement services which are responsible
+  for tracking resources and deciding which compute node instances
+  should be on.
+* An "API database" that is used primarily by nova-api and
+  nova-scheduler (called *API-level services* below) to track location
+  information about instances, as well as a temporary location for
+  instances being built but not yet scheduled.
+* The nova-conductor service which offloads long-running tasks for the
+  API-level service, as well as insulates compute nodes from direct
+  database access
+* The nova-compute service which manages the virt driver and
+  hypervisor host.
+* A "cell database" which is used by API, conductor and compute
+  services, and which houses the majority of the information about
+  instances.
+* A "cell0 database" which is just like the cell database, but
+  contains only instances that failed to be scheduled.
+* A message queue which allows the services to communicate with each
+  other via RPC.
+
+All deployments have at least the above components. Small deployments
+likely have a single message queue that all services share, and a
+single database server which hosts the API database, a single cell
+database, as well as the required cell0 database. This is considered a
+"single-cell deployment" because it only has one "real" cell. The
+cell0 database mimics a regular cell, but has no compute nodes and is
+used only as a place to put instances that fail to land on a real
+compute node (and thus a real cell).
+
+The purpose of the cells functionality in nova is specifically to
+allow larger deployments to shard their many compute nodes into cells,
+each of which has a database and message queue. The API database is
+always and only global, but there can be many cell databases (where
+the bulk of the instance information lives), each with a portion of
+the instances for the entire deployment within.
+
+All of the nova services use a configuration file, all of which will
+at a minimum specify a message queue endpoint
+(i.e. ``[DEFAULT]/transport_url``). Most of the services also require
+configuration of database connection information
+(i.e. ``[database]/connection``). API-level services that need access
+to the global routing and placement information will also be
+configured to reach the API database
+(i.e. ``[api_database]/connection``).
+
+.. note:: The pair of ``transport_url`` and ``[database]/connection``
+          configured for a service defines what cell a service lives
+          in.
+
+API-level services need to be able to contact other services in all of
+the cells. Since they only have one configured ``transport_url`` and
+``[database]/connection`` they look up the information for the other
+cells in the API database, with records called *cell mappings*.
+
+.. note:: The API database must have cell mapping records that match
+          the ``transport_url`` and ``[database]/connection``
+          configuration elements of the lower-level services. See the
+          ``nova-manage`` :ref:`man-page-cells-v2` commands for more
+          information about how to create and examine these records.
+
+Service Layout
+==============
+
+The services generally have a well-defined communication pattern that
+dictates their layout in a deployment. In a small/simple scenario, the
+rules do not have much of an impact as all the services can
+communicate with each other on a single message bus and in a single
+cell database. However, as the deployment grows, scaling and security
+concerns may drive separation and isolation of the services.
+
+Simple
+------
+
+This is a diagram of the basic services that a simple (single-cell)
+deployment would have, as well as the relationships
+(i.e. communication paths) between them:
+
+.. graphviz::
+
+  digraph services {
+    graph [pad="0.35", ranksep="0.65", nodesep="0.55", concentrate=true];
+    node [fontsize=10 fontname="Monospace"];
+    edge [arrowhead="normal", arrowsize="0.8"];
+    labelloc=bottom;
+    labeljust=left;
+
+    { rank=same
+      api [label="nova-api"]
+      apidb [label="API Database" shape="box"]
+      scheduler [label="nova-scheduler"]
+    }
+    { rank=same
+      mq [label="MQ" shape="diamond"]
+      conductor [label="nova-conductor"]
+    }
+    { rank=same
+      cell0db [label="Cell0 Database" shape="box"]
+      celldb [label="Cell Database" shape="box"]
+      compute [label="nova-compute"]
+    }
+
+    api -> mq -> compute
+    conductor -> mq -> scheduler
+
+    api -> apidb
+    api -> cell0db
+    api -> celldb
+
+    conductor -> apidb
+    conductor -> cell0db
+    conductor -> celldb
+  }
+
+All of the services are configured to talk to each other over the same
+message bus, and there is only one cell database where live instance
+data resides. The cell0 database is present (and required) but as no
+compute nodes are connected to it, this is still a "single cell"
+deployment.
+
+Multiple Cells
+--------------
+
+In order to shard the services into multiple cells, a number of things
+must happen. First, the message bus must be split into pieces along
+the same lines as the cell database. Second, a dedicated conductor
+must be run for the API-level services, with access to the API
+database and a dedicated message queue. We call this *super conductor*
+to distinguish its place and purpose from the per-cell conductor nodes.
+
+.. graphviz::
+
+  digraph services2 {
+    graph [pad="0.35", ranksep="0.65", nodesep="0.55", concentrate=true];
+    node [fontsize=10 fontname="Monospace"];
+    edge [arrowhead="normal", arrowsize="0.8"];
+    labelloc=bottom;
+    labeljust=left;
+
+    subgraph api {
+      api [label="nova-api"]
+      scheduler [label="nova-scheduler"]
+      conductor [label="super conductor"]
+      { rank=same
+        apimq [label="API MQ" shape="diamond"]
+        apidb [label="API Database" shape="box"]
+      }
+
+      api -> apimq -> conductor
+      api -> apidb
+      conductor -> apimq -> scheduler
+      conductor -> apidb
+    }
+
+    subgraph clustercell0 {
+      label="Cell 0"
+      color=green
+      cell0db [label="Cell Database" shape="box"]
+    }
+
+    subgraph clustercell1 {
+      label="Cell 1"
+      color=blue
+      mq1 [label="Cell MQ" shape="diamond"]
+      cell1db [label="Cell Database" shape="box"]
+      conductor1 [label="nova-conductor"]
+      compute1 [label="nova-compute"]
+
+      conductor1 -> mq1 -> compute1
+      conductor1 -> cell1db
+
+    }
+
+    subgraph clustercell2 {
+      label="Cell 2"
+      color=red
+      mq2 [label="Cell MQ" shape="diamond"]
+      cell2db [label="Cell Database" shape="box"]
+      conductor2 [label="nova-conductor"]
+      compute2 [label="nova-compute"]
+
+      conductor2 -> mq2 -> compute2
+      conductor2 -> cell2db
+    }
+
+    api -> mq1 -> conductor1
+    api -> mq2 -> conductor2
+    api -> cell0db
+    api -> cell1db
+    api -> cell2db
+
+    conductor -> cell0db
+    conductor -> cell1db
+    conductor -> mq1
+    conductor -> cell2db
+    conductor -> mq2
+  }
+
+It is important to note that services in the lower cell boxes do not
+have the ability to call back to the API-layer services via RPC, nor
+do they have access to the API database for global visibility of
+resources across the cloud. This is intentional and provides security
+and failure domain isolation benefits, but also has impacts on some
+things that would otherwise require this any-to-any communication
+style. Check the release notes for the version of Nova you are using
+for the most up-to-date information about any caveats that may be
+present due to this limitation.
+
+Caveats of a Multi-Cell deployment
+----------------------------------
+
+.. note: This information is correct as of the Pike release.
+
+Cross-cell instance migrations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Currently it is not possible to migrate an instance from a host in one
+cell to a host in another cell. This may be possible in the future,
+but it is currently unsupported. This impacts cold migration,
+resizes, live migrations, evacuate, and unshelve operations.
+
+Quota-related quirks
+~~~~~~~~~~~~~~~~~~~~
+
+Quotas are now calculated live at the point at which an operation
+would consume more resource, instead of being kept statically in the
+database. This means that a multi-cell environment may incorrectly
+calculate the usage of a tenant if one of the cells is unreachable, as
+those resources cannot be counted. In this case, the tenant may be
+able to consume more resource from one of the available cells, putting
+them far over quota when the unreachable cell returns. In the future,
+placement will provide us with a consistent way to calculate usage
+independent of the actual cell being reachable.
+
+Performance of listing instances
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+With multiple cells, the instance list operation may not sort and
+paginate results properly when crossing multiple cell
+boundaries. Further, the performance of a sorted list operation will
+be considerably slower than with a single cell.
+
+Notifications
+~~~~~~~~~~~~~
+
+With a multi-cell environment with multiple message queues, it is
+likely that operators will want to configure a separate connection to
+a unified queue for notifications. This can be done in the
+configuration file of all nodes. See the `oslo.messaging configuration
+<https://docs.openstack.org/oslo.messaging/latest/configration/opts.html#oslo_messaging_notifications.transport_url>`_
+documentation for more details
+
+Neutron Metadata API proxy
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The Neutron metadata API proxy should be global across all cells, and
+thus be configured as an API-level service with access to the
+``[api_database]/connection`` information.