Search This Blog

Wednesday, November 7, 2012

What IP addresses are initialy assigned on a new cloud server

I have spun up a standard tiny/small cloud server using the Amazon Cloud Console, Rackspace Cloud Control Panel and the HP Management Console [1]. We can see that for every vendors the provisioned cloud servers have different network settings.

Problem

What is a standard IP and routing configuration on a freshly built cloud server.

Solution
  1. Rackspace

    Example network config for Rackspace NextGen cloud server

    root@manage2:~# ip a
    1: lo: LOOPBACK,UP,LOWER_UP mtu 16436 qdisc noqueue state UNKNOWN
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
        inet6 ::1/128 scope host
           valid_lft forever preferred_lft forever
    2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qdisc pfifo_fast state UP qlen 1000
        link/ether bc:76:4e:08:07:f4 brd ff:ff:ff:ff:ff:ff
        inet 5.79.xx.yy/24 brd 5.79.xx.yy scope global eth0
        inet6 2a00:1a48:7805:111:8cfc:xxx:yyy:zzz/64 scope global
           valid_lft forever preferred_lft forever
        inet6 fe80::be76:aaa:bbb:ccc/64 scope link
           valid_lft forever preferred_lft forever
    3: eth1:  BROADCAST,MULTICAST,UP,LOWER_UP  mtu 1500 qdisc pfifo_fast state UP qlen 1000
        link/ether bc:76:4e:08:04:39 brd ff:ff:ff:ff:ff:ff
        inet 10.178.195.rrr/18 brd 10.178.255.255 scope global eth1
        inet6 fe80::be76:4eff:rrrr:sss/64 scope link
           valid_lft forever preferred_lft forever
    
    root@manage2:~# ip route
    default via 5.79.21.1 dev eth0  metric 100
    5.79.21.0/24 dev eth0  proto kernel  scope link  src 5.79.aa.bbb
    10.176.0.0/12 via 10.178.192.1 dev eth1
    10.178.192.0/18 dev eth1  proto kernel  scope link  src 10.178.195.191
    


  2. Amazon

  3. Example network config for Amazon EC2 cloud server

    ubuntu@ip-10-203-43-168:~$ ip a
    1: lo:  LOOPBACK,UP,LOWER_UP  mtu 16436 qdisc noqueue state UNKNOWN
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
        inet6 ::1/128 scope host
           valid_lft forever preferred_lft forever
    2: eth0:  BROADCAST,MULTICAST,UP,LOWER_UP  mtu 1500 qdisc pfifo_fast state UP qlen 1000
        link/ether 12:31:3b:09:28:5a brd ff:ff:ff:ff:ff:ff
        inet 10.203.43.168/23 brd 10.203.43.255 scope global eth0
        inet6 fe80::1031:3bff:fe09:285a/64 scope link
           valid_lft forever preferred_lft forever
    
    ubuntu@ip-10-203-43-168:~$ ip route
    default via 10.203.42.1 dev eth0  metric 100
    10.203.42.0/23 dev eth0  proto kernel  scope link  src 10.203.43.168
    


  4. HP Cloud
Example network config for HP Openstack cloud server

[root@server-1352331143-az-3-region-a-geo-1 ~]# ip a
1: lo:  LOOPBACK,UP,LOWER_UP  mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0:  BROADCAST,MULTICAST,UP,LOWER_UP  mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 02:16:3e:70:b7:9e brd ff:ff:ff:ff:ff:ff
    inet 10.2.0.219/15 brd 10.3.255.255 scope global eth0
    inet6 fe80::16:3eff:aaaa:bbbb/64 scope link
       valid_lft forever preferred_lft forever
3: sit0:  NOARP  mtu 1480 qdisc noop
    link/sit 0.0.0.0 brd 0.0.0.0

[root@server-1352331143-az-3-region-a-geo-1 ~]# ip route
169.254.0.0/16 dev eth0  scope link
10.2.0.0/15 dev eth0  proto kernel  scope link  src 10.2.0.219
default via 10.2.0.1 dev eth0



References
    http://rtomaszewski.blogspot.co.uk/2012/08/links-to-cloud-provider-web-management.html http://aws.amazon.com/free/

Easy of use and learning curve for Rackspace or Amazon Cloud

Even though Rackspace and Amazon provide similar tools and products they both are built on different principles and have a different agenda and mission.

As expected this is going to shape the company image, product and services offerings. As there are countless metrics and ways to compare I would like to take a look at the user experience and evaluate a easy of use only.

You can learn and experience this yourself when setting up a simple test infrastructural build of Apache Web Server, MySql Database and Tomcat Applicatin Server. Alternatively we can learn from other and these 2 blog posts representing a very interesting summary:

Amazon and Rackspace: A Comparison, Part 2
Amazon and Rackspace: A Comparison, Part 1

A short summary:
Rackspace cloud is easier to use when interacting with cloud for the first time. The emphasis is on simplicity rather then elasticity and richness of features.

Amazon is much more mature and feature full. Unfortunately this leads to a longer learning curve and requires more knowledge when building your own infrastructure.

Friday, November 2, 2012

After server virtualization there is a time for network to be virtualized

Virtualization become today de facto a standard in almost every company. What about 10 years ago was a revolution in computing it has become a mature product for everyone. But when we look at the history how it was evolving we can see one component to remain unmodified: the network, one of the very few unvirtualized technology bastions.

But today the market is changing. To understand the changes and what all this means I recommend to read at least these 2 blogs:

VMWARE BUYS NICIRA: A HYPERVISOR VENDOR WOKE UP
VMware’s Acquisition of Nicira – VMware confirming the hypervisor is dead

The changes the blogs describe are already happening. As a practical example is a Rackspace Cloud Network product, a hybrid network that was created to leverage the potential in software defined network that data center provider can benefit from.


Further reading & references
  1. http://www.chriscolotti.us/vmware/nicira-nvp/nicira-nvp-virtualized-networking-primer/
  2. http://www.chriscolotti.us/vmware/nicira-nvp/the-nicira-nvp-component-architecture/
  3. http://nicira.com/en/frequently-asked-questions 
  4. http://www.rackspace.com/cloud/

Thursday, November 1, 2012

Code review process in Openstack uses Zuul

For a modern software development process to be effective as well as developer friendly it requires a usage of automation systems like:

  • Continuous integration
  • Build automation
  • Automated unit and integration testing
  • Rreview process
  • (distributed) Version control system
In Openstack the process has been implemented in the following way.


Further info about the components and how this works together you can watch this video as well[1].


References
  1. http://www.youtube.com/watch?v=e2H4dfJTx68&feature=em-subs_digest-newavtr
  2. https://github.com/openstack-ci/zuul/
  3. http://www.slideshare.net/lzyeval/assign-commit-and-review

How much is my system overloaded when I create many new processes

It is an engineering art to design and write a good application that performs well under a stress and doesn't overuses and kills your operating system.

Problem

How much load is put on an operating system when your application create and destroys a large number of processes.

Simulation

To simulate a condition when an application creates a massive number of processes I'm going to use a parallel tool. With a help of it we will run a test to simulate multiple creation and destroying of processes.

Basic pre-configuration

# we don't want any IO to the disk [1] 
# mount -t tmpfs -o size=10m tmpfs /tmp/tmpfs.d/

The command below is going to simulate a condition where 100 processes will be started simultaneously  Every process lives only for a short time before it finishes and dies. Once a process is terminated a new one will be started to replace it.
 
# cd /tmp/tmpfs.d/
# time seq 1 10000 | parallel --joblog /tmp/tmpfs.d/joblog.txt -j100 echo '{} $(date)' ">" log.{}.txt
real    4m33.859s
user    0m28.870s
sys     1m39.590s

We can monitor the resource utilization on the system with a help of these 2 simple commands.

Number of process
 
# while true; do ps -AHf | wc -l ; sleep .1; done | uniq 
178
176
177
176
...

CPU utilization
 
# top | grep Cpu
...
# the average results were 
Cpu(s): 23.4%us, 76.6%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
...
# the stats during the last seconds 
Cpu(s): 21.5%us, 78.2%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu(s): 26.2%us, 73.8%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu(s): 24.8%us, 75.2%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu(s): 25.1%us, 74.9%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu(s): 23.1%us, 76.9%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu(s): 22.3%us, 77.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu(s): 22.8%us, 77.2%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu(s): 22.4%us, 77.2%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu(s): 12.3%us, 42.9%sy,  0.0%ni, 44.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.3%st
Cpu(s):  0.3%us,  2.3%sy,  0.0%ni, 97.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu(s):  1.0%us,  2.3%sy,  0.0%ni, 96.3%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu(s):  0.3%us,  2.0%sy,  0.0%ni, 97.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu(s):  0.3%us,  2.0%sy,  0.0%ni, 97.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st


Analysis

The above stats show that the CPU was primary busy executing operating system level code. The user and sys times confirms that only for 28s the processor was executing a user level code. Majority of the time the Linux kernel was performing all necessary tasks to create or destroy of processes.

Looking at the system calls we can see that the most expensive calls were 'clone' and 'close'. Interestingly the 'close' took longer then 'clone' but this may be truth because the process we were staring over and over was only 'echo $(date)' or because there were some synchronization within parallel tool itself. 

Collect system syscall
 
# strace -r -o strace.r.txt parallel --ungroup -j1 echo '{} $(date)' ">" log.{}.txt  ::: 1 2

Syscall relative timestamps
 
cat strace.r.txt  | egrep -i -A1 '0\.00[2-9]'
--
     0.004394 --- SIGCHLD (Child exited) @ 0 (0) ---
     0.000042 rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7fdaa838d4c0}, NULL, 8) = 0
--
     0.003015 close(7)                  = 0
     0.000076 rt_sigaction(SIGINT, {SIG_IGN, [], SA_RESTORER, 0x7fdaa838d4c0}, {SIG_DFL, [], SA_RESTORER, 0x7fdaa838d4c0}, 8) = 0
--
     0.005035 --- SIGCHLD (Child exited) @ 0 (0) ---
     0.000043 rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7fdaa838d4c0}, NULL, 8) = 0
--
     0.002960 close(7)                  = 0
     0.000074 rt_sigaction(SIGINT, {SIG_IGN, [], SA_RESTORER, 0x7fdaa838d4c0}, {SIG_DFL, [], SA_RESTORER, 0x7fdaa838d4c0}, 8) = 0
--
     0.003542 --- SIGCHLD (Child exited) @ 0 (0) ---
     0.000042 rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7fdaa838d4c0}, NULL, 8) = 0

     # truncated


# cat strace.r.txt | egrep -i -A1 'clone'
     0.000102 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fdaa8ca89d0) = 25086
     0.003318 close(7)                  = 0
--
     0.000085 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fdaa8ca89d0) = 25087
     0.002620 close(7)                  = 0
--
     0.000085 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fdaa8ca89d0) = 25088
     0.002617 close(7)                  = 0
--
     0.000097 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fdaa8ca89d0) = 25089
     0.003037 close(7)                  = 0

     # truncated


# strace -c parallel --ungroup -j1 echo '{} $(date)' log.{}.txt  ::: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 72.01    0.108059        2573        42           clone    ### on average 65%
 26.97    0.040474          33      1211         3 wait4
  0.92    0.001379           1      1152         1 select
  0.02    0.000033           1        33           chmod
  0.02    0.000029           0       268        12 open
  0.02    0.000025           1        48           gettimeofday
  0.02    0.000024           0       498        73 lseek
  0.02    0.000024           0       233         1 rt_sigaction
  0.02    0.000024           0       294           fcntl
  0.00    0.000000           0       213           read
  0.00    0.000000           0         1           write
  0.00    0.000000           0       401           close
...
------ ----------- ----------- --------- --------- ----------------
100.00    0.150071                  5978       975 total


References
  1. http://www.thegeekstuff.com/2008/11/overview-of-ramfs-and-tmpfs-on-linux/
  2. http://www.thegeekstuff.com/2011/11/strace-examples/
  3. http://rtomaszewski.blogspot.co.uk/2012/10/how-to-install-parallel-linux-tool-on.html

Tuesday, October 30, 2012

How to install parallel Linux tool on Ubuntu or Debian

Parallel is a quite new tool created under the GNU foundation  As its name says it helps to execute jobs in parallel on one or many computers.

Problem

How to install parallel tool on Ubuntu 12.04 Precise [2]

Solution

As the package hasn't been debianised in Ubuntu yet we have to install it using the old school methods.

Follow link [3] and download the appropriate package. Once downloaded install it using dpkg.

wget http://download.opensuse.org/repositories/home:/tange/xUbuntu_12.04/all/parallel_20110422-1_all.deb
dpkg -i parallel_20110422-1_all.deb
type  -a parallel                                                                       22:51:35
parallel is /usr/bin/parallel

References
  1. http://www.gnu.org/software/parallel/
  2. https://wiki.ubuntu.com/DevelopmentCodeNames
  3. https://build.opensuse.org/package/show?package=parallel&project=home%3Atange

Thursday, October 25, 2012

How to extract a duration of a tcp session from tcpdump file


I took a tcpdump to capture all my application connections to data base server. I can filter the tcpudmp data and extract the session that are relevant by using standard tcpdump filters.

Problem

How to find a duration of a tcp session without manually checking packets and calculating the elapsed time.

Solution

There are many tools that can read and understand a tcpudmp file. One of them is tcptrace. An  example of how to use it to find the time is demonstrated below.

root@db1:~# tcptrace -n -l -o1 
1 arg remaining, starting with 'google.pcap'
Ostermann's tcptrace -- version 6.6.7 -- Thu Nov  4, 2004

12 packets seen, 12 TCP packets traced
elapsed wallclock time: 0:00:00.001738, 6904 pkts/sec analyzed
trace file elapsed time: 0:00:07.092266
TCP connection info:
1 TCP connection traced:
TCP connection 1:
        host a:        2a00:1a48:7805:0111:8cfc:cf10:ff08:0a2f:55939
        host b:        2a00:1450:400c:0c05::0063:80
        complete conn: yes
        first packet:  Wed Oct 24 22:49:59.166611 2012
        last packet:   Wed Oct 24 22:50:06.258878 2012
        elapsed time:  0:00:07.092266
        total packets: 12
        filename:      google.pcap
   a->b:                              b->a:
     total packets:             6           total packets:             6
     ack pkts sent:             5           ack pkts sent:             6
     ...

References
  1. http://www.tcptrace.org/manual/node11_tf.html
  2. http://docstore.mik.ua/orelly/networking_2ndEd/tshoot/ch05_05.htm
  3. http://www.noah.org/wiki/Packet_sniffing
  4. http://www.darknet.org.uk/2007/11/tcpflow-tcp-flow-recorder-for-protocol-analysis-and-debugging/
  5. http://danielmiessler.com/study/tcpdump/