Test plan of provisioning systems.
Add test plan which describes how measure performance of provisioning systems. * Add table titles * Delete configuration management template * Delete author info * Fix abstract and conventions sections * Change hardware info table to good looking * Fix the reference to the script Change-Id: I8cb3524dbd12bcd67502e3c6fd9c003b495f18bb
This commit is contained in:
parent
8d0ce71e72
commit
a101c13368
@ -11,4 +11,3 @@ Performance Documentation
|
|||||||
.. raw:: pdf
|
.. raw:: pdf
|
||||||
|
|
||||||
PageBreak oneColumn
|
PageBreak oneColumn
|
||||||
|
|
||||||
|
@ -10,3 +10,4 @@ Test Plans
|
|||||||
:maxdepth: 2
|
:maxdepth: 2
|
||||||
|
|
||||||
mq/index
|
mq/index
|
||||||
|
provisioning/main
|
||||||
|
345
doc/source/test_plans/provisioning/main.rst
Normal file
345
doc/source/test_plans/provisioning/main.rst
Normal file
@ -0,0 +1,345 @@
|
|||||||
|
.. _Measuring_performance_of_provisioning_systems:
|
||||||
|
|
||||||
|
=============================================
|
||||||
|
Measuring performance of provisioning systems
|
||||||
|
=============================================
|
||||||
|
|
||||||
|
:status: draft is in progress
|
||||||
|
:version: 0
|
||||||
|
|
||||||
|
:Abstract:
|
||||||
|
|
||||||
|
This document describes a test plan for quantifying the performance of
|
||||||
|
provisioning systems as a function of the number of nodes to be provisioned. The
|
||||||
|
plan includes the collection of several resource utilization metrics, which will
|
||||||
|
be used to analyze and understand the overall performance of each system. In
|
||||||
|
particular, resource bottlenecks will either be fixed, or best practices
|
||||||
|
developed for system configuration and hardware requirements.
|
||||||
|
|
||||||
|
:Conventions:
|
||||||
|
|
||||||
|
- **Provisioning:** is the entire process of installing and configuring an
|
||||||
|
operating system.
|
||||||
|
|
||||||
|
- **Provisioning system:** is a service or a set of services which enables the
|
||||||
|
installation of an operating system and performs basic operations such as
|
||||||
|
configuring network interfaces and partitioning disks. A preliminary
|
||||||
|
`list of provisioning systems`_ can be found below in `Applications`_.
|
||||||
|
The provisioning system
|
||||||
|
can include configuration management systems like Puppet or Chef, but
|
||||||
|
this feature will not be considered in this document. The test plan for
|
||||||
|
configuration management systems is described in the
|
||||||
|
"Measuring_performance_of_configuration_management_systems" document.
|
||||||
|
|
||||||
|
- **Performance of a provisioning system:** is a set of metrics which
|
||||||
|
describes how many nodes can be provisioned at the same time and the
|
||||||
|
hardware resources required to do so.
|
||||||
|
|
||||||
|
- **Nodes:** are servers which will be provisioned.
|
||||||
|
|
||||||
|
List of performance metrics
|
||||||
|
---------------------------
|
||||||
|
The table below shows the list of test metrics to be collected. The priority
|
||||||
|
is the relative ranking of the importance of each metric in evaluating the
|
||||||
|
performance of the system.
|
||||||
|
|
||||||
|
.. table:: List of performance metrics
|
||||||
|
|
||||||
|
+--------+------------------------+------------------------------------------+
|
||||||
|
|Priority| Parameter | Description |
|
||||||
|
+========+========================+==========================================+
|
||||||
|
| | | | The elapsed time to provision all |
|
||||||
|
| 1 |PROVISIONING_TIME(NODES)| | nodes, as a function of the numbers of |
|
||||||
|
| | | | nodes |
|
||||||
|
+--------+------------------------+------------------------------------------+
|
||||||
|
| | | | Incoming network bandwidth usage as a |
|
||||||
|
| 2 |INGRESS_NET(NODES) | | function of the number of nodes. |
|
||||||
|
| | | | Average during provisioning on the host|
|
||||||
|
| | | | where the provisioning system is |
|
||||||
|
| | | | installed. |
|
||||||
|
+--------+------------------------+------------------------------------------+
|
||||||
|
| | | | Outgoing network bandwidth usage as a |
|
||||||
|
| 2 | EGRESS_NET(NODES) | | function of the number of nodes. |
|
||||||
|
| | | | Average during provisioning on the host|
|
||||||
|
| | | | where the provisioning system is |
|
||||||
|
| | | | installed. |
|
||||||
|
+--------+------------------------+------------------------------------------+
|
||||||
|
| | | | CPU utilization as a function of the |
|
||||||
|
| 3 | CPU(NODES) | | number of nodes. Average during |
|
||||||
|
| | | | provisioning on the host where the |
|
||||||
|
| | | | provisioning system is installed. |
|
||||||
|
+--------+------------------------+------------------------------------------+
|
||||||
|
| | | | Active memory usage as a function of |
|
||||||
|
| 3 | RAM(NODES) | | the number of nodes. Average during |
|
||||||
|
| | | | provisioning on the host where the |
|
||||||
|
| | | | provisioning system is installed. |
|
||||||
|
+--------+------------------------+------------------------------------------+
|
||||||
|
| | | | Storage read IO bandwidth as a |
|
||||||
|
| 3 | WRITE_IO(NODES) | | function of the number of nodes. |
|
||||||
|
| | | | Average during provisioning on the host|
|
||||||
|
| | | | where the provisioning system is |
|
||||||
|
| | | | installed. |
|
||||||
|
+--------+------------------------+------------------------------------------+
|
||||||
|
| | | | Storage write IO bandwidth as a |
|
||||||
|
| 3 | READ_IO(NODES) | | function of the number of nodes. |
|
||||||
|
| | | | Average during provisioning on the host|
|
||||||
|
| | | | where the provisioning system is |
|
||||||
|
| | | | installed. |
|
||||||
|
+--------+------------------------+------------------------------------------+
|
||||||
|
|
||||||
|
Test Plan
|
||||||
|
---------
|
||||||
|
|
||||||
|
The above performance metrics will be measured for various number
|
||||||
|
of provisioned nodes. The result will be a table that shows the
|
||||||
|
dependence of these metrics on the number of nodes.
|
||||||
|
|
||||||
|
Environment description
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
Test results MUST include a description of the environment used. The following items
|
||||||
|
should be included:
|
||||||
|
|
||||||
|
- **Hardware configuration of each server.** If virtual machines are used then both
|
||||||
|
physical and virtual hardware should be fully documented.
|
||||||
|
An example format is given below:
|
||||||
|
|
||||||
|
.. table:: Description of server hardware
|
||||||
|
|
||||||
|
+-------+----------------+-------+-------+
|
||||||
|
|server |name | | |
|
||||||
|
| +----------------+-------+-------+
|
||||||
|
| |role | | |
|
||||||
|
| +----------------+-------+-------+
|
||||||
|
| |vendor,model | | |
|
||||||
|
| +----------------+-------+-------+
|
||||||
|
| |operating_system| | |
|
||||||
|
+-------+----------------+-------+-------+
|
||||||
|
|CPU |vendor,model | | |
|
||||||
|
| +----------------+-------+-------+
|
||||||
|
| |processor_count | | |
|
||||||
|
| +----------------+-------+-------+
|
||||||
|
| |core_count | | |
|
||||||
|
| +----------------+-------+-------+
|
||||||
|
| |frequency_MHz | | |
|
||||||
|
+-------+----------------+-------+-------+
|
||||||
|
|RAM |vendor,model | | |
|
||||||
|
| +----------------+-------+-------+
|
||||||
|
| |amount_MB | | |
|
||||||
|
+-------+----------------+-------+-------+
|
||||||
|
|NETWORK|interface_name | | |
|
||||||
|
| +----------------+-------+-------+
|
||||||
|
| |vendor,model | | |
|
||||||
|
| +----------------+-------+-------+
|
||||||
|
| |bandwidth | | |
|
||||||
|
+-------+----------------+-------+-------+
|
||||||
|
|STORAGE|dev_name | | |
|
||||||
|
| +----------------+-------+-------+
|
||||||
|
| |vendor,model | | |
|
||||||
|
| +----------------+-------+-------+
|
||||||
|
| |SSD/HDD | | |
|
||||||
|
| +----------------+-------+-------+
|
||||||
|
| |size | | |
|
||||||
|
+-------+----------------+-------+-------+
|
||||||
|
|
||||||
|
- **Configuration of hardware network switches.** The configuration file from the
|
||||||
|
switch can be downloaded and attached.
|
||||||
|
|
||||||
|
- **Configuration of virtual machines and virtual networks (if they are used).**
|
||||||
|
The configuration files can be attached, along with the mapping of virtual
|
||||||
|
machines to host machines.
|
||||||
|
|
||||||
|
- **Network scheme.** The plan should show how all hardware is connected and
|
||||||
|
how the components communicate. All ethernet/fibrechannel and VLAN channels
|
||||||
|
should be included. Each interface of every hardware component should be
|
||||||
|
matched with the corresponding L2 channel and IP address.
|
||||||
|
|
||||||
|
- **Software configuration of the provisioning system.** `sysctl.conf` and any
|
||||||
|
other kernel file that is changed from the default should be attached.
|
||||||
|
List of installed packages should be attached. Specifications of the
|
||||||
|
operating system, network interfaces configuration, and disk partitioning
|
||||||
|
configuration should be included. If distributed provisioning systems are
|
||||||
|
to be tested then the parts that are distributed need to be described.
|
||||||
|
|
||||||
|
- **Desired software configuration of the provisioned nodes.**
|
||||||
|
The operating system, disk partitioning scheme, network interface
|
||||||
|
configuration, installed packages and other components of the nodes
|
||||||
|
affect the amount of work to be performed by the provisioning system
|
||||||
|
and thus its performance.
|
||||||
|
|
||||||
|
Preparation
|
||||||
|
^^^^^^^^^^^
|
||||||
|
1.
|
||||||
|
The following package needs to be installed on the provisioning system
|
||||||
|
servers to collect performance metrics.
|
||||||
|
|
||||||
|
.. table:: Software to be installed
|
||||||
|
|
||||||
|
+--------------+---------+-----------------------------------+
|
||||||
|
| package name | version | source |
|
||||||
|
+==============+=========+===================================+
|
||||||
|
| `dstat`_ | 0.7.2 | Ubuntu trusty universe repository |
|
||||||
|
+--------------+---------+-----------------------------------+
|
||||||
|
|
||||||
|
Measuring performance values
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
The script
|
||||||
|
`Full script for collecting performance metrics`_
|
||||||
|
can be used for the first five of the following steps.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
If a distributed provisioning system is used, the values need to be
|
||||||
|
measured on each provisioning system instance.
|
||||||
|
|
||||||
|
1.
|
||||||
|
Start the collection of CPU, memory, network, and storage metrics during the
|
||||||
|
provisioning process. Use the dstat programm which can collect all of these
|
||||||
|
metrics in CSV format into a log file.
|
||||||
|
2.
|
||||||
|
Start the provisioning process for the first node and record the wall time.
|
||||||
|
3.
|
||||||
|
Wait until the provisioning process has finished (when all nodes are reachable
|
||||||
|
via ssh)
|
||||||
|
and record the wall time.
|
||||||
|
4.
|
||||||
|
Stop the dstat program.
|
||||||
|
5.
|
||||||
|
Prepare collected data for analysis. dstat provides a large amount of
|
||||||
|
information, which can be pruned by saving only the following:
|
||||||
|
|
||||||
|
* "system"[time]. Save as given.
|
||||||
|
|
||||||
|
* 100-"total cpu usage"[idl]. dstat provides only the idle CPU value. CPU
|
||||||
|
utilization is calculated by subtracting the idle value from 100%.
|
||||||
|
|
||||||
|
* "memory usage"[used]. dstat provides this value in Bytes.
|
||||||
|
This is converted it to Megabytes by dividing by 1024*1024=1048576.
|
||||||
|
|
||||||
|
* "net/eth0"[recv] receive bandwidth on the NIC. It is converted to Megabits
|
||||||
|
per second by dividing by 1024*1024/8=131072.
|
||||||
|
|
||||||
|
* "net/eth0"[send] send bandwidth on the NIC. It is converted to Megabits
|
||||||
|
per second by dividing by 1024*1024/8=131072.
|
||||||
|
|
||||||
|
* "net/eth0"[recv]+"net/eth0"[send]. The total receive and transmit bandwidth
|
||||||
|
on the NIC. dstat provides these values in Bytes per second. They are
|
||||||
|
converted to Megabits per second by dividing by 1024*1024/8=131072.
|
||||||
|
|
||||||
|
* "io/total"[read] storage read IO bandwidth.
|
||||||
|
|
||||||
|
* "io/total"[writ] storage write IO bandwidth.
|
||||||
|
|
||||||
|
* "io/total"[read]+"io/total"[writ]. The total read and write storage IO
|
||||||
|
bandwidth.
|
||||||
|
|
||||||
|
These values will be graphed and maximum values reported.
|
||||||
|
|
||||||
|
6.
|
||||||
|
Repeat steps 1-5 for provisioning at the same time the following number of
|
||||||
|
nodes:
|
||||||
|
|
||||||
|
* 10 nodes
|
||||||
|
* 20 nodes
|
||||||
|
* 40 nodes
|
||||||
|
* 80 nodes
|
||||||
|
* 160 nodes
|
||||||
|
* 320 nodes
|
||||||
|
* 640 nodes
|
||||||
|
* 1280 nodes
|
||||||
|
* 2000 nodes
|
||||||
|
|
||||||
|
Additional tests will be performed if some anomalous behaviour is found.
|
||||||
|
These may require the collection of additional performance metrics.
|
||||||
|
|
||||||
|
7.
|
||||||
|
The result of this part of test will be:
|
||||||
|
|
||||||
|
* to provide the following graphs, one for each number of provisioned nodes:
|
||||||
|
|
||||||
|
#) Three dependencies on one graph.
|
||||||
|
|
||||||
|
* INGRESS_NET(TIME) Dependence on time of incoming network bandwidth usage.
|
||||||
|
* EGRESS_NET(TIME) Dependence on time of outgoing network bandwidth usage.
|
||||||
|
* ALL_NET(TIME) Dependence on time of total network bandwidth usage.
|
||||||
|
|
||||||
|
#) One dependence on one graph.
|
||||||
|
|
||||||
|
* CPU(TIME) Dependence on time of CPU utilization.
|
||||||
|
|
||||||
|
#) One dependence on one graph.
|
||||||
|
|
||||||
|
* RAM(TIME) Dependence on time of active memory usage.
|
||||||
|
|
||||||
|
#) Three dependencies on one graph.
|
||||||
|
|
||||||
|
* WRITE_IO(TIME) Dependence on time of storage write IO bandwidth.
|
||||||
|
* READ_IO(TIME) Dependence on time of storage read IO bandwidth.
|
||||||
|
* ALL_IO(TIME) Dependence on time of total storage IO bandwidth.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
If a distributed provisioning system is used, the above graphs should be
|
||||||
|
provided for each provisioning system instance.
|
||||||
|
|
||||||
|
* to fill in the following table for maximum values:
|
||||||
|
|
||||||
|
The resource metrics are obtained from the maxima of the corresponding graphs
|
||||||
|
above. The provisioning time is the elapsed time for all nodes to be
|
||||||
|
provisioned. One set of metrics will be given for each number of provisioned
|
||||||
|
nodes.
|
||||||
|
|
||||||
|
.. table:: Maximum values of performance metrics
|
||||||
|
|
||||||
|
+-------+--------------+---------+---------+---------+---------+
|
||||||
|
|| nodes|| provisioning|| maximum|| maximum|| maximum|| maximum|
|
||||||
|
|| count|| time || CPU || RAM || NET || IO |
|
||||||
|
| | || usage || usage || usage || usage |
|
||||||
|
+=======+==============+=========+=========+=========+=========+
|
||||||
|
| 10 | | | | | |
|
||||||
|
+-------+--------------+---------+---------+---------+---------+
|
||||||
|
| 20 | | | | | |
|
||||||
|
+-------+--------------+---------+---------+---------+---------+
|
||||||
|
| 40 | | | | | |
|
||||||
|
+-------+--------------+---------+---------+---------+---------+
|
||||||
|
| 80 | | | | | |
|
||||||
|
+-------+--------------+---------+---------+---------+---------+
|
||||||
|
| 160 | | | | | |
|
||||||
|
+-------+--------------+---------+---------+---------+---------+
|
||||||
|
| 320 | | | | | |
|
||||||
|
+-------+--------------+---------+---------+---------+---------+
|
||||||
|
| 640 | | | | | |
|
||||||
|
+-------+--------------+---------+---------+---------+---------+
|
||||||
|
| 1280 | | | | | |
|
||||||
|
+-------+--------------+---------+---------+---------+---------+
|
||||||
|
| 2000 | | | | | |
|
||||||
|
+-------+--------------+---------+---------+---------+---------+
|
||||||
|
|
||||||
|
Applications
|
||||||
|
------------
|
||||||
|
|
||||||
|
list of provisioning systems
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
.. table:: list of provisioning systems
|
||||||
|
|
||||||
|
+-----------------------------+---------+
|
||||||
|
| Name of provisioning system | Version |
|
||||||
|
+=============================+=========+
|
||||||
|
| `Cobbler`_ | 2.4 |
|
||||||
|
+-----------------------------+---------+
|
||||||
|
| `Razor`_ | 0.13 |
|
||||||
|
+-----------------------------+---------+
|
||||||
|
| Image based provisioning | |
|
||||||
|
| via downloading images with | - |
|
||||||
|
| bittorrent protocol | |
|
||||||
|
+-----------------------------+---------+
|
||||||
|
|
||||||
|
Full script for collecting performance metrics
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
.. literalinclude:: measure.sh
|
||||||
|
:language: bash
|
||||||
|
|
||||||
|
.. references:
|
||||||
|
|
||||||
|
.. _dstat: http://dag.wiee.rs/home-made/dstat/
|
||||||
|
.. _Cobbler: http://cobbler.github.io/
|
||||||
|
.. _Razor: https://github.com/puppetlabs/razor-server
|
86
doc/source/test_plans/provisioning/measure.sh
Normal file
86
doc/source/test_plans/provisioning/measure.sh
Normal file
@ -0,0 +1,86 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# Need to install the required packages on provisioning system servers:
|
||||||
|
if (("`dpkg -l | grep dstat | grep ^ii > /dev/null; echo $?` == 1"))
|
||||||
|
then
|
||||||
|
apt-get -y install dstat
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Need to prepare the following script on provisioning system server to collect
|
||||||
|
# values of CPU,RAM,NET and IO loads per second. You need to change "INTERFACE"
|
||||||
|
# variable regarding the interface which connected to nodes to communicare with
|
||||||
|
# them during provisioning process. As a result of this command we'll get
|
||||||
|
# running in backgroud dstat programm which collecting needed parametes in CSV
|
||||||
|
# format into /var/log/dstat.log file.:
|
||||||
|
INTERFACE=eth0
|
||||||
|
OUTPUT_FILE=/var/log/dstat.csv
|
||||||
|
dstat --nocolor --time --cpu --mem --net -N ${INTERFACE} --io --output ${OUTPUT_FILE} > /dev/null &
|
||||||
|
|
||||||
|
# Need to prepare script which starts provisioning process and gets the time when
|
||||||
|
# provisioning started and when provisioning ended ( when all nodes reachable via
|
||||||
|
# ssh). We'll analyze results collected during this time window. For getting
|
||||||
|
# start time we can add "date" command before API call or CLI command and forward
|
||||||
|
# the output of the command to some log file. Here is example for cobbler:
|
||||||
|
ENV_NAME=env-1
|
||||||
|
start_time=`date +%s.%N`
|
||||||
|
echo "Provisioning started at "`date` > /var/log/provisioning.log
|
||||||
|
for SYSTEM in `cobbler system find --comment=${ENV_NAME}`
|
||||||
|
do
|
||||||
|
cobbler system reboot --name=$i &
|
||||||
|
done
|
||||||
|
|
||||||
|
# For getting end-time we can use the script below. This script tries to reach
|
||||||
|
# nodes via ssh and write "Provisioning finished at <date/time>" into
|
||||||
|
# /var/log/provisioning.log file. You'll need to provide ip addresses of the
|
||||||
|
# nodes (from file nodes_ips.list, where IPs listed one per line) and
|
||||||
|
# creadentials (SSH_PASSWORD and SSH_USER variables):
|
||||||
|
SSH_OPTIONS="StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null"
|
||||||
|
SSH_PASSWORD="r00tme"
|
||||||
|
SSH_USER="root"
|
||||||
|
NODE_IPS=(`cat nodes_ips.list`)
|
||||||
|
TIMER=0
|
||||||
|
TIMEOUT=20
|
||||||
|
while (("${TIMER}" < "${TIMEOUT}"))
|
||||||
|
do
|
||||||
|
for NODE_IP in ${NODE_IPS[@]}
|
||||||
|
do
|
||||||
|
SSH_CMD="sshpass -p ${SSH_PASSWORD} ssh -o ${SSH_OPTIONS} ${SSH_USER}@${NODE_IP}"
|
||||||
|
${SSH_CMD} "hostname" && UNHAPPY_SSH=0 || UNHAPPY_SSH=1
|
||||||
|
if (("${UNHAPPY_SSH}" == "0"))
|
||||||
|
then
|
||||||
|
echo "Node with ip "${NODE_IP}" is reachable via ssh"
|
||||||
|
NODE_IPS=(${NODE_IPS[@]/${NODE_IP}})
|
||||||
|
else
|
||||||
|
echo "Node with ip "${NODE_IP}" is still unreachable via ssh"
|
||||||
|
fi
|
||||||
|
done
|
||||||
|
TIMER=$((${TIMER} + 1))
|
||||||
|
if (("${TIMER}" == "${TIMEOUT}"))
|
||||||
|
then
|
||||||
|
echo "The following "${#NODE_IPS[@]}" are unreachable"
|
||||||
|
echo ${NODE_IPS[@]}
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
if ((${#NODE_IPS[@]} == 0 ))
|
||||||
|
then
|
||||||
|
break
|
||||||
|
fi
|
||||||
|
# Check that nodes are reachable once per 1 seconds
|
||||||
|
sleep 1
|
||||||
|
done
|
||||||
|
echo "Provisioning finished at "`date` > /var/log/provisioning.log
|
||||||
|
|
||||||
|
end_time=`date +%s.%N`
|
||||||
|
elapsed_time=$(echo "$end_time - $start_time" | bc -l)
|
||||||
|
echo "Total elapsed time for provisioning: $elapsed_time seconds" > /var/log/provisioning.log
|
||||||
|
|
||||||
|
# Stop dstat command
|
||||||
|
killall dstat
|
||||||
|
|
||||||
|
# Delete excess values and convert to needed metrics. So, we'll get the
|
||||||
|
# following csv format:
|
||||||
|
# time,cpu_usage,ram_usage,net_recv,net_send,net_all,dsk_io_read,dsk_io_writ,dsk_all
|
||||||
|
awk -F "," 'BEGIN {getline;getline;getline;getline;getline;getline;getline;
|
||||||
|
print "time,cpu_usage,ram_usage,net_recv,net_send,net_all,dsk_io_read,dsk_io_writ,dsk_all"}
|
||||||
|
{print $1","100-$4","$8/1048576","$12/131072","$13/131072","($12+$13)/131072","$14","$15","$14+$15}' \
|
||||||
|
$OUTPUT_FILE > /var/log/10_nodes.csv
|
Loading…
x
Reference in New Issue
Block a user