Search This Blog

Monday, August 13, 2012

Rackspace Cloud server API performance analisis for cloud server bursting

In the cloud world the public cloud API allows us to create cloud servers that can differ in size. A current list of available flavors (combination of a RAM sizes and disk space) for the FirstGen Rackspace cloud is:

$ /usr/bin/cloudservers --username $FG_OS_USERNAME --apikey $FG_OS_PASSWORD  flavor-list
+----+---------------+-------+------+
| ID |      Name     |  RAM  | Disk |
+----+---------------+-------+------+
| 1  |   256 server  |  256  |  10  |
| 2  |   512 server  |  512  |  20  |
| 3  |   1GB server  |  1024 |  40  |
| 4  |   2GB server  |  2048 |  80  |
| 5  |   4GB server  |  4096 | 160  |
| 6  |   8GB server  |  8192 | 320  |
| 7  | 15.5GB server | 15872 | 620  |
| 8  |  30GB server  | 30720 | 1200 |
+----+---------------+-------+------+

Problem

A performance can be measured in many different ways. When working with the API I start asking my self recently these questions. As I couldn't fine definitive answer I decided to perform some testing to get a better feeling what you can expect.
  1. How long does it take to create XYZ identical cloud servers and get access to them (XYZ can vary from 4 to 100)
  2. Is there any performance degradation to expect when we want to create a high number of cloud servers on demand in a short amount of time
  3. Does the cloud server size affects the time needed to create a single server
  4. Does the cloud server size has to be taken into consideration when we perform cloud bursting? 
  5. What is a cloud server build failure rate when performing cloud busting?

Analysis

I have run a number of tests to determine and to find out more information about the performance when doing cloud bursting. The tests were relatively simple but they still give us a good overview.

Single test case description
 - Create XYZ cloud servers as quickly as possible (for big numbers we need to watch for the API limits and introduce artificial delays).
 - Using API pooling try to verify if the build is complete. How long does it take.
 - Save logs for offline review and and generate build statistics.
 - Measure how long the overall test takes.

For different cloud flavor I simulate different cloud bursting scenarios. All the tests were tailored and limited to the maximal of 150GB amount of RAM to use. The tests were run sequentialy one after another with a small gap of about 2-10min between them (logs overview and results collection)

 simulate test#1 -s 100 -f 1  # build 100 cloud servers, using 256MB instance
 simulate test#2 -s 50  -f 2  
 simulate test#3 -s 50  -f 3  
 simulate test#4 -s 50  -f 4
 simulate test#5 -s 50  -f 5
 simulate test#6 -s 18  -f 6 
 simulate test#7 -s 9   -f 7
 simulate test#8 -s 4   -f 8


Results



Each tests has a different color. A single dot represent a whole time needed to start a build for a single cloud server and then checking a status until it is ready to be use.

The table below shows a whole time for running a single test.

burstingFlavor RAMFlavor Idtest starttest endduration [m]errors
100256103:28:27 PM04:11:35 PM43:08.100
50512208:33:23 PM08:56:02 PM22:39.121
501024309:09:55 PM09:32:04 PM22:09.251
502048409:46:14 PM10:02:41 PM16:26.501
504096510:22:21 PM10:35:05 PM12:43.3012
188192611:06:48 PM11:14:28 PM07:39.870
915872711:16:23 PM11:23:38 PM07:15.350
430720811:35:33 PM11:46:27 PM10:53.721


Having these results above I can say that:

1. In average it takes from 300 to 500 seconds to create a cloud server. The only exception is when you try to build the biggest one with 30GB of RAM that can take up to 600s and more.

2. The API limitation for creating servers are directly influencing the results. The burst test for 256MB instances took 43 minutes to complete. The create time for a single cloud server were from min 250 to almost 500 seconds.

3. Base on the tests we don't see any significant build performance degradation pattern. It is somehow expected result as when the cloud servers are built they are built on different hypervisors in different zones (huddles). This means that as long there are enough free resources in the data center the builds should be fine.

4. In average the build time for different flavors is similar. In every test there were min and maximum times. The more cloud servers we build during the burst the more the distribution varies. We need to keep in mind that the number of cloud servers to build were not the some.

5. Although all the build times are between min 250 and 500 max (exception is the 30GB cloud server) it is visible that bigger instances require more time to build. Interestingly in a single test the graph shows that as we progress and create more and more cloud servers the build time decreases fist and then later spikes up. This pattern can be clearly observed for the 256MB instances.

6. The results for the 4GB instances are not fully correct. It has to be noted that the errors in table were a result of the hard API limitations we run into. The test cloud account had a limit of about 150GB of RAM that we could consume.

7. With this in mind the build failure rates were minimal or none for all tests. It is important to note that the errors are actually timeout issues only. In every test we were waiting about for 10min maximum for a cloud server to be up and running. Only 3 test produced cloud servers that were not accessible in 10 min. I believe that if we had waited longer these servers would be built successfully.

References

1. http://searchcloudcomputing.techtarget.com/definition/cloud-bursting

No comments:

Post a Comment