Merge "docs: describe migration and other movement concepts"

2015-11-24 18:09:39 +00:00 · 2015-11-24 18:09:39 +00:00 · 8970fc9e4f
commit 8970fc9e4f
parent f8e1be2217 192e6b1d03
1 changed files with 133 additions and 0 deletions
--- a/api-guide/source/server_concepts.rst
+++ b/api-guide/source/server_concepts.rst
@ -390,3 +390,136 @@ assigned at creation time.
          "accessIPv6":"::babe:67.23.10.132"
       }
    }
+
+Moving servers
+~~~~~~~~~~~~~~
+
+There are several actions that may result in a server moving from one
+compute host to another including shelve, resize, migrations and
+evacuate. The following use cases demonstrate the intention of the
+actions and the consequence for operational procedures.
+
+**Shelving**
+
+Sometimes a user does not require a server to be active for a while,
+perhaps over a weekend or at certain times of day. This gives
+the cloud operator an opportunity to make better use of resources by
+freeing resources and rebalancing workloads across the infrastructure.
+
+When the user shelves a server the operator can choose to remove it
+from the compute hosts. When it is unshelved it is scheduled to a new
+host according to the operators policies for distributing work loads
+across the compute hosts, including taking disabled hosts into account.
+This will contribute to increased overall capacity, freeing hosts that
+are ear-marked for maintenance and providing contiguous blocks
+of resources on single hosts due to moving out old servers.
+
+Shelving a server is not normally a choice that is available to
+the cloud operator because it affects the availability of the server
+being provided to the user.
+
+**Resize**
+
+Sometimes a user may want to change the flavor of a server, e.g. change
+the quantity of cpus, disk, memory or any other resource. This is done
+by rebuilding the server with a new flavor. As the server is being
+rebuilt it is normal to reschedule the server to another host
+(although resize to the same host is an option for the operator).
+
+As with shelving, resize provides the cloud operator with an
+opportunity to redistribute work loads across the cloud according
+to the operators scheduling policy, providing the same benefits as
+above.
+
+Resizing a server is not normally a choice that is available to
+the cloud operator because it changes the nature of the server
+being provided to the user.
+
+**Migration (including cold and live migration)**
+
+Sometimes a cloud operator may need to redistribute work loads for
+operational purposes. For example, the operator may need to remove
+a compute host for maintenance or deploy a kernel security patch that
+requires the host to be rebooted.
+
+The operator has two actions available for deliberately moving
+work loads: cold migration (moving a server that is not active)
+and live migration (moving a server that is active).
+
+Cold migration moves a server from one host to another by copying its
+state, local storage and network configuration to new resources
+allocated on a new host selected by scheduling policies or as
+an explicit decision. The operation is relatively quick as the
+server is not changing its state during the copy process. The user
+does not have access to the server during the operation.
+
+Live migration moves a server from one host to another while it
+is active, so it is constantly changing its state during the action.
+As a result it can take considerably longer than cold migration.
+During the action the server is online and accessible, but only
+a limited set of management actions are available to the user.
+
+The following are two common patterns for employing migrations in
+a cloud:
+
+-  **Host maintenance**
+
+   If a compute host is to be removed from the cloud all its servers
+   will need to moved to other hosts. In this case it is normal for
+   the rest of the cloud to absorb the work load, redistributing
+   the servers by rescheduling them.
+
+   To prepare the host it will be disabled so it does not receive
+   any further servers. Then each server will be migrated to a new
+   host by cold or live migration, depending on the state of the
+   server. When complete, the host is free to be removed.
+
+-  **Rolling updates**
+
+   Often it is necessary to perform an update on all compute hosts
+   that requires them to be rebooted. In this case it is not
+   strictly necessary to move inactive instances because they
+   will be available after the reboot. However, active instances would
+   be impacted by the reboot. Live migration will allow them to
+   continue operation.
+
+   In this case a rolling approach can be taken by starting with an
+   empty compute host that has been updated and rebooted. Another host
+   that has not yet been updated is disabled and all its servers are
+   migrated to the new host. When the migrations are complete the
+   new host continues normal operation. The old host will be empty
+   and can be updated and rebooted. It then becomes the new target for
+   another round of migrations.
+
+   This process can be repeated until the whole cloud has been updated,
+   usually using a pool of empty hosts instead of just one.
+
+Migrating a server is not normally a choice that is available to
+the cloud user because the user is not normally aware of compute
+hosts. Management of the cloud and how servers are provisioned
+in it is the sole responsibility of the cloud operator.
+
+**Evacuate**
+
+Sometimes a compute host may fail. This is a rare occurrence, but when
+it happens during normal operation the servers running on the host may
+be lost. In this case the operator may recreate the servers on the
+remaining compute hosts using the evacuate action.
+
+Failure detection can be proved to be impossible in compute systems
+with asynchronous communication, so true failure detection cannot be
+achieved. Usually when a host is considered to have failed it should be
+excluded from the cloud and any virtual networking or storage associated
+with servers on the failed host should be isolated from it. These steps
+are called fencing the host. Initiating these action is outside the scope
+of Nova.
+
+Once the host has been fenced its servers can be recreated on other
+hosts without worry of the old incarnations reappearing and trying to
+access shared resources. It is usual to redistribute the servers
+from a failed host by rescheduling them.
+
+Evacuating a server is solely in the domain of the cloud operator because
+it must be performed in coordination with other operational procedures to
+be safe. A user is not normally aware of compute hosts but is adversely
+affected by their failure.