The Systems Engineer organized chaos: 2013

Tuesday, December 31, 2013

OpenvSwitch SDN simulator with mininet

Mininet is a network simulator for openvswitch and openflow. It allows you to build network typologies that emulate hosts, switches and routers. It is very easy to set up by following the instalation guide. All what you have to do is to download the VM file, import the downloaded image and create your minint VM. After that you can follow this other guide and start playing with it.

And if you are interested more how it is built you can take a look here:

https://github.com/mininet/mininet/wiki/Introduction-to-Mininet
http://mininet.org/overview/

The successful architecture for a top-of-rack switch for data center

TOR switch architecture

We wrote before about Arista switches and about the Arista EOS architecture (network OS). A company with a name Pica8 is another example that follows a very similar technological model. They use the best out of the Linux and butter this up with some more hardware (ASIC) dependent software to achieve maximal performance.

What is interesting for Pica8 is that they take a very liberal approach to the hardware itself. They say they could use any plain switching chip or motherboard blades and turn it into a fully operational switch. The secret is once again a well design Pica8 network OS they built.

In comparison to Arista CLI (that is very alike the Cisco one) Pica8 uses a rather different syntax: http://pica8.org/blogs/?p=399. At first glance it has some similarities to what you typie on Juniper boxes :).

More info about them and products can be found here:
http://www.networkworld.com/news/2010/102810-pica8-opensource-switching.html?page=1
http://www.pica8.com/open-switching/1-gbe-10gbe-open-switches.php

Monday, December 30, 2013

How to use GridFS to store big files in MongoDB

Important note: This article is in relation to online MongoDB course. For more information about the course and other posts describing its content please check my main page here: M101P: MongoDB for Developers course.

Mongo supports document that are up to 16MB in size. To store bigger files you need to use the GridFs feature.

The file we are going to insert.

server# ls -lah mongodb-linux-x86_64-2.4.8.tgz
-rw-r--r-- 1 root root 91M Oct 31 22:25 mongodb-linux-x86_64-2.4.8.tgz

server:~/mongo-course/M101P/week2# ls -lah $HOME/my-big-files.img
lrwxrwxrwx 1 root root 30 Dec 30 17:55 /root/my-big-files.img -> mongodb-linux-x86_64-2.4.8.tgz

mongo shell

server# mongofiles -d grid-cli-examle put mongodb-linux-x86_64-2.4.8.tgz
connected to: 127.0.0.1
added file: { _id: ObjectId('52c1b104232188a316e51d61'), filename: "mongodb-linux-x86_64-2.4.8.tgz", chunkSize: 262144, uploadDate: new Date(1388425483792), md5: "4954765464dc4d97870ddc5de147e05d", length: 95015187 }
done!

server# mongo grid-cli-examle
MongoDB shell version: 2.4.8
connecting to: grid-cli-examle
> show collections
fs.chunks
fs.files
system.indexes

> db.fs.files.find()
{ "_id" : ObjectId("52c1b104232188a316e51d61"), "filename" : "mongodb-linux-x86_64-2.4.8.tgz", "chunkSize" : 262144, "uploadDate" : ISODate("2013-12-30T17:44:43.792Z"), "md5" : "4954765464dc4d97870ddc5de147e05d", "length" : 95015187 }

> db.fs.chunks.find().count()
363

Python

This little program reads the file from the disk and insets it into the gridfs like collection in Mongo

import pymongo
import gridfs
import sys
import os

connection = pymongo.Connection("mongodb://localhost", safe=True)
db = connection.grid_python_example
c = db.bigfiles

grid = gridfs.GridFS(db, "myfile")
f = open( os.environ['HOME'] + "/my-big-files.img")
_id = grid.put(f)
f.close()

c.insert( {'grid_id':_id, "filename":"my-big-files.img"} )

Logging back to shell we can confirm that the file was saved.

server# mongo grid_python_example
MongoDB shell version: 2.4.8
connecting to: grid_python_example

> show collections
bigfiles
myfile.chunks
myfile.files
system.indexes

> db.bigfiles.find()
{ "_id" : ObjectId("52c1b9d55f4cb27cabe6e650"), "filename" : "my-big-files.img", "grid_id" : ObjectId("52c1b97e5f4cb27cabe6e4e4") }
> db.myfile.chunks.find().count()
363

References

http://docs.mongodb.org/manual/core/gridfs/
http://docs.mongodb.org/manual/reference/gridfs/
http://docs.mongodb.org/manual/reference/program/mongofiles/

Multi key indexes

> db.students.insert( { name : "rado" , teachers: [0,1]} )
> db.students.insert( { name : "adam" , teachers: [0,1,3]} )
> db.students.find()
{ "_id" : ObjectId("52c186421fdaf7a8e42b43b8"), "name" : "rado", "teachers" : [  0,  1 ] }
{ "_id" : ObjectId("52c186531fdaf7a8e42b43b9"), "name" : "adam", "teachers" : [  0,  1,  3 ] }

That way you create an index on the array attribute.

> db.students.ensureIndex( { teachers:1 }  )
> db.system.indexes.find()
{ "v" : 1, "key" : { "teachers" : 1 }, "ns" : "school.students", "name" : "teachers_1" }

The usage of the index should be transparent when using the find method but you can always she the execution plan.

> db.students.find()
{ "_id" : ObjectId("52c186421fdaf7a8e42b43b8"), "name" : "rado", "teachers" : [  0,  1 ] }
{ "_id" : ObjectId("52c186531fdaf7a8e42b43b9"), "name" : "adam", "teachers" : [  0,  1,  3 ] }
{ "_id" : ObjectId("52c187621fdaf7a8e42b43bb"), "name" : "jon", "teachers" : [  2,  3 ] }
>
> db.students.find( { teachers : { $all : [0,1]} } )
{ "_id" : ObjectId("52c186421fdaf7a8e42b43b8"), "name" : "rado", "teachers" : [  0,  1 ] }
{ "_id" : ObjectId("52c186531fdaf7a8e42b43b9"), "name" : "adam", "teachers" : [  0,  1,  3 ] }


> db.students.find( { teachers : { $all : [0,1]} } ).explain()
{
        "cursor" : "BtreeCursor teachers_1",   ------ this proves that we use the index 
        "isMultiKey" : true,
        "n" : 2,
        "nscannedObjects" : 2,
        "nscanned" : 2,
        "nscannedObjectsAllPlans" : 2,
        "nscannedAllPlans" : 2,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 11,
        "indexBounds" : {
                "teachers" : [
                        [
                                0,
                                0
                        ]
                ]
        },
        "server" : "mongo2:27017"
}

Physical network diagram generated from dot config language

Is there a good and easy solution to automatically generate logical or physical network diagrams? Trying answer this question I've been playing today with the dot configuration language to test it in the field.

Automatically generated diagram

The DOT config file

graph physical_net_topology {
 
legend [ label="{ legend | {red | public} |{blue| v1002 inside} | {green | v105 dmz} }", shape=record];


subgraph cluster_sp {
    label="pub switch";

    graph [ fillcolor="burlywood", style="filled"]
    node [shape=record,fillcolor="white", style="filled"]
    edge[style=invis];
    
    node [label="3"] p3 ;
    node [label="2"] p2 ;
    node [label="1"] p1 ;

    { rank=same; p1; p2; p3}
    p1 -- p2 -- p3;
}

subgraph cluster_si {
    label="Internal switch";

    edge[style=invis];
    node [shape=record ];

    node [label="1"] e1 ;
    node [label="2"] e2 ;
    node [label="3"] e3 ;
    node [label="4"] e4 ;
    node [label="5"] e5 ;
    node [label="6"] e6 ;

    { rank=same; e1; e2; e3; e4; e5; e6;}
    e1 -- e2 -- e3 -- e4 -- e5 -- e6;
}

subgraph cluster_fw1 {
    label="FW1 ports";

    graph = [ style = rounded]
    edge[style=invis]
    
    node [shape=record ];
    node [label="1"] f1 ;
    node [label="2"] f2 ;
    node [label="3"] f3 ;
    node [label="4"] f4 ;

  { rank=same; f1; f2; f3; f4; }

  f1 -- f2 -- f3 -- f4'
}

p2 -- f1 [color="red"]
f2 -- e1 [color="blue"]
f3 -- e2 [color="green"]

e3 -- server1 [color="green"]
e4 -- server2 [color="green"]
e6 -- server3 [color="blue"]

}

Results discussion

It took me a good couple of hours to write this config. Even after that I still have only a very basic understanding of how flexible the dot language is. The coding was very laborious. It required a lot of trying and testing if the new generated graph is what you are looking for.

Often the options I was trying didn't have any effect.

The documentation I found and read wasn't explaining all the details so trying and intuition was often your only friend.

References

http://sandbox.kidstrythisathome.com/erdos/
http://graphviz-dev.appspot.com/
http://www.graphviz.org/

http://rtomaszewski.blogspot.co.uk/search/label/diagram

Sunday, December 29, 2013

Cisco ASA connection table state description and examples

On ASA in the connection table you can find protocol sessions (TCP, UDP, ICMP and others) that describe the state of the session (like TCP/IP) when the command was run.

In the session you can find all currently managed sessions by the ASA. From this output you can understand as well as from what IPs your clients are coming from and to what services they connect.

Session statutes

 
fw-asa# sh conn  

Flags: A - awaiting inside ACK to SYN, a - awaiting outside ACK to SYN,
       B - initial SYN from outside, C - CTIQBE media, D - DNS, d - dump,
       E - outside back connection, F - outside FIN, f - inside FIN,
       G - group, g - MGCP, H - H.323, h - H.225.0, I - inbound data,
       i - incomplete, J - GTP, j - GTP data, K - GTP t3-response
       k - Skinny media, M - SMTP data, m - SIP media, n - GUP
       O - outbound data, P - inside back connection, p - Phone-proxy TFTP connection,
       q - SQL*Net data, R - outside acknowledged FIN,
       R - UDP SUNRPC, r - inside acknowledged FIN, S - awaiting inside SYN,
       s - awaiting outside SYN, T - SIP, t - SIP transient, U - up,
       V - VPN orphan, W - WAAS,
       X - inspected by service module

Example flags meaning from the session entities

 
UB
U - up,
B - initial SYN from outside,

UO
U - up,
O - outbound data,

UIB
U - up,
I - inbound data,
B - initial SYN from outside,

UIOB
U - up,
I - inbound data,
O - outbound data,
B - initial SYN from outside,

UfIB
U - up,
f - inside FIN,
I - inbound data,
B - initial SYN from outside,

UfrO
U - up,
f - inside FIN,
r - inside acknowledged FIN,
O - outbound data,

UfIOB 
U - up,
f - inside FIN,
I - inbound data,
O - outbound data,
B - initial SYN from outside,

UfFIOB
 the same like UfIOB 
 F - outside FIN,

UfFRIOB
the same like UfFIOB
R - UDP SUNRPC,

UfrIOB
U - up,
f - inside FIN,
r - inside acknowledged FIN
I - inbound data,
O - outbound data,
B - initial SYN from outside,

SaAB
S - awaiting inside SYN,
a - awaiting outside ACK to SYN,
A - awaiting inside ACK to SYN, 
B - initial SYN from outside,

aB
a - awaiting outside ACK to SYN,
B - initial SYN from outside,

Example flow you can find in the ASA firewall connection table

Usually a lot entries with these state.

 
fw-asa# sh conn detail long

flags UfIOB TCP outside:1.165.177.125/1965 (1.165.177.125/1965) inside:192.168.55.172/4567 (1.2.157.172/4567), flags UfIOB, idle 52m38s, uptime 54m21s, timeout 1h0m, bytes 3063
flags UfIOB TCP outside:1.172.130.64/1485 (1.172.130.64/1485) inside:192.168.55.172/4567 (1.2.157.172/4567), flags UfIOB, idle 41m38s, uptime 43m12s, timeout 1h0m, bytes 3063

flags UB TCP outside:1.189.22.195/16208 (1.189.22.195/16208) inside:192.168.55.172/4567 (1.2.157.172/4567), flags UB, idle 45m6s, uptime 48m17s, timeout 1h0m, bytes 0
flags UB TCP outside:1.56.45.22/24654 (1.56.45.22/24654) inside:192.168.55.172/4567 (1.2.157.172/4567), flags UB, idle 45m54s, uptime 49m4s, timeout 1h0m, bytes 0

Common but less frequent state

 
flags UfFIOB TCP outside:1.55.216.14/14104 (1.55.216.14/14104) inside:192.168.55.172/4567 (1.2.157.172/4567), flags UfFIOB, idle 41m51s, uptime 43m24s, timeout 1h0m, bytes 3002
flags UfFIOB TCP outside:110.81.84.50/20230 (110.81.84.50/20230) inside:192.168.55.172/4567 (1.2.157.172/4567), flags UfFIOB, idle 52m55s, uptime 54m28s, timeout 1h0m, bytes 3063

flags UfFRIOB TCP outside:109.109.38.148/4760 (109.109.38.148/4760) inside:192.168.55.172/4567 (1.2.157.172/4567), flags UfFRIOB, idle 3s, uptime 15s, timeout 5m0s, bytes 2261
flags UfFRIOB TCP outside:112.12.221.155/3753 (112.12.221.155/3753) inside:192.168.55.172/4567 (1.2.157.172/4567), flags UfFRIOB, idle 0s, uptime 0s, timeout 5m0s, bytes 1008

flags UfIB TCP outside:121.35.47.128/1481 (121.35.47.128/1481) inside:192.168.55.172/4567 (1.2.157.172/4567), flags UfIB, idle 23m54s, uptime 26m28s, timeout 1h0m, bytes 1106
flags UfIB TCP outside:183.11.2.56/4589 (183.11.2.56/4589) inside:192.168.55.172/4567 (1.2.157.172/4567), flags UfIB, idle 47m15s, uptime 49m48s, timeout 1h0m, bytes 1106

flags SaAB TCP outside:112.72.135.224/7494 (112.72.135.224/7494) inside:192.168.55.172/4567 (1.2.157.172/4567), flags SaAB, idle 0s, uptime 0s, timeout 1m0s, bytes 0
flags SaAB TCP outside:113.170.107.218/4472 (113.170.107.218/4472) inside:192.168.55.172/4567 (1.2.157.172/4567), flags SaAB, idle 0s, uptime 0s, timeout 1m0s, bytes 0

flags UfrO TCP outside:202.168.215.226/80 (202.168.215.226/80) inside:192.168.55.172/3845 (1.2.157.172/3845), flags UfrO, idle 6s, uptime 8s, timeout 10m0s, bytes 1182

flags UIOB TCP outside:61.187.244.179/9571 (61.187.244.179/9571) inside:192.168.55.172/4567 (1.2.157.172/4567), flags UIOB, idle 38m13s, uptime 39m46s, timeout 1h0m, bytes 2897
flags UIOB TCP outside:67.47.251.34/14921 (67.47.251.34/14921) inside:192.168.55.172/4567 (1.2.157.172/4567), flags UIOB, idle 48m14s, uptime 49m50s, timeout 1h0m, bytes 3348

TCP outside:1.2.27.69/49856 (1.2.27.69/49856) FW-INSIDE:192.168.100.112/80 (11.22.192.112/80), flags UIB, idle 0s, uptime 0s, timeout 1h0m, bytes 581

flags UO TCP outside:202.168.215.226/80 (202.168.215.226/80) inside:192.168.55.172/3848 (1.2.157.172/3848), flags UO, idle 7s, uptime 7s, timeout 1h0m, bytes 1182

TCP outside:220.135.240.219/61139 (220.135.240.219/61139) inside:192.168.55.172/4567 (1.2.157.172/4567), flags aB, idle 0s, uptime 0s, timeout 1m0s, bytes 0
TCP outside:220.135.240.219/61138 (220.135.240.219/61138) inside:192.168.55.172/4567 (1.2.157.172/4567), flags aB, idle 0s, uptime 0s, timeout 1m0s, bytes 0

# without the 'long' parameter
TCP outside 94.5.94.11:59458 FW-DMZ-LB 192.168.67.79:80, idle 0:04:31, bytes 19424, flags UfrIOB
TCP outside 94.5.94.11:59463 FW-DMZ-LB 192.168.67.72:80, idle 0:04:05, bytes 7181, flags UfrIOB

You can specify additional parameters to filter output for specific connection entries state.

 
fw-asa# sh conn detail  long state tcp_embryonic all

TCP outside:220.135.240.219/61139 (220.135.240.219/61139) inside:192.168.55.172/4567 (1.2.157.172/4567), flags aB, idle 0s, uptime 0s, timeout 1m0s, bytes 0
TCP outside:220.135.240.219/61138 (220.135.240.219/61138) inside:192.168.55.172/4567 (1.2.157.172/4567), flags aB, idle 0s, uptime 0s, timeout 1m0s, bytes 0

fw-asa# sh conn long state data_out

TCP outside:112.65.211.244/6680 (112.65.211.244/6680) inside:192.168.55.172/4567 (1.2.157.172/4567), flags UIOB, idle 0s, uptime 3m48s, timeout 1h0m, bytes 72509
TCP outside:113.247.3.129/3253 (113.247.3.129/3253) inside:192.168.55.172/4567 (1.2.157.172/4567), flags UIOB, idle 1s, uptime 6m12s, timeout 1h0m, bytes 139249
TCP outside:2.176.137.197/1950 (2.176.137.197/1950) inside:192.168.55.172/4567 (1.2.157.172/4567), flags UfIOB, idle 5m37s, uptime 7m14s, timeout 1h0m, bytes 3002
TCP outside:171.118.104.53/64054 (171.118.104.53/64054) inside:192.168.55.172/80 (1.2.157.172/80), flags UIOB, idle 8s, uptime 7m27s, timeout 1h0m, bytes 98878
TCP outside:219.139.32.90/4141 (219.139.32.90/4141) inside:192.168.55.172/80 (1.2.157.172/80), flags UIOB, idle 7s, uptime 7m32s, timeout 1h0m, bytes 94113

fw-asa# sh conn long state data_in

TCP outside:112.65.211.244/6680 (112.65.211.244/6680) inside:192.168.55.172/4567 (1.2.157.172/4567), flags UIOB, idle 4s, uptime 3m37s, timeout 1h0m, bytes 44907
TCP outside:113.247.3.129/3253 (113.247.3.129/3253) inside:192.168.55.172/4567 (1.2.157.172/4567), flags UIOB, idle 1s, uptime 6m1s, timeout 1h0m, bytes 137801

fw-asa# sh conn long state finin

TCP outside:138.91.170.208/1264 (138.91.170.208/1264) inside:192.168.55.172/80 (1.2.157.172/80), flags UfFRIOB, idle 0s, uptime 0s, timeout 5m0s, bytes 5052
TCP outside:2.176.137.197/1950 (2.176.137.197/1950) inside:192.168.55.172/4567 (1.2.157.172/4567), flags UfIOB, idle 4m45s, uptime 6m21s, timeout 1h0m, bytes 3002
TCP outside:2.176.137.197/1653 (2.176.137.197/1653) inside:192.168.55.172/4567 (1.2.157.172/4567), flags UfIOB, idle 5m6s, uptime 6m43s, timeout 1h0m, bytes 3002

fw-asa# sh conn long state up

TCP outside:112.65.211.244/6680 (112.65.211.244/6680) inside:192.168.55.172/4567 (1.2.157.172/4567), flags UIOB, idle 0s, uptime 2m50s, timeout 1h0m, bytes 37914
TCP outside:113.247.3.129/3253 (113.247.3.129/3253) inside:192.168.55.172/4567 (1.2.157.172/4567), flags UIOB, idle 4s, uptime 5m14s, timeout 1h0m, bytes 78789
TCP outside:2.176.137.197/1950 (2.176.137.197/1950) inside:192.168.55.172/4567 (1.2.157.172/4567), flags UfIOB, idle 4m39s, uptime 6m16s, timeout 1h0m, bytes 3002
TCP outside:171.118.104.53/64054 (171.118.104.53/64054) inside:192.168.55.172/80 (1.2.157.172/80), flags UIOB, idle 0s, uptime 6m29s, timeout 1h0m, bytes 89118
TCP outside:219.139.32.90/4141 (219.139.32.90/4141) inside:192.168.55.172/80 (1.2.157.172/80), flags UIOB, idle 9s, uptime 6m35s, timeout 1h0m, bytes 82689
TCP outside:2.176.137.197/1653 (2.176.137.197/1653) inside:192.168.55.172/4567 (1.2.157.172/4567), flags UfIOB, idle 5m1s, uptime 6m38s, timeout 1h0m, bytes 3002
TCP outside:2.176.137.197/1589 (2.176.137.197/1589) inside:192.168.55.172/4567 (1.2.157.172/4567), flags UfIOB, idle 5m13s, uptime 6m49s, timeout 1h0m, bytes 3002

Status and maturity of the Neutron in Openstack Havana release

Randy Bias from Cloudscaling organized a meetup and brought together 4 engineers who work on cloud network and Openstack Neutron:

Juniper, Rudra Rugee
Midokura, Ryu Ishimoto
VMware, Aaron Rosen
PLUMgrid, Edgar Magana

More info about the meetup can be found here:
http://www.meetup.com/openstack/events/152128692/
http://cloudscaling.com/blog/cloud-computing/neutron-in-production-work-in-progress-or-ready-for-prime-time/
http://www.slideshare.net/randybias/sfbay-openstack-meetup-neutron-and-sdn-in-production-20131203

Below are some of my notes I took when watching the video.

Key SDN components

Network programmability (API)
Network vitalization (responsible for creating overlay network, multi-tenancy, managing flows, gateways, virtual routers ...)

What cloud network is and what Openstack Neutron is trying to solve

Neutron is an abstraction layer. It separates your physical network from the direct "tenant network"/virtual network topology. It allows you pragmatically (through a reach API and cli) create virtual routers, lb, firewall. That way it allows you to emulate/virtualize physical devices in cloud.

The main idea is to keep the physical layer as simple and minimal as possible but reach enough so you can create more complex configuration on top of it. That way you can think about Neutron in term of configuration management, orchestration or network vitalization management.

Neutron

It defines a clear public API how to interact with Openstack to create network objects.
Provide Network as a Service (NaaS) to tenants to enable more advance deployments and configuration options.
Initially the goal was to remove the Nova Network from Openstack and create its won project to allow more flexibility and expand functionality (Nova network does support only a limited number of topologies: flat, flat DHCP and VLANs).
Platform for new future network development in Openstack.
Neutron is promising a scalable and highly available infrastructure for tenants.
There are some single point of failures (SPOF) in the current vanilla Havana release. This is one of the differences between the open source vanilla Neutron and vendor specific plugins.
It provides a choice of technology and no vendor lock-in for network in Openstack. One way of looking at it is that the customer have a choice of what technology/vendor he would like to use to build up the network infrastructure. The there benefit is access to an public open API that is independent of the actual backed technology. But it is important to understand that various backed drivers may provide different and more reach functionality than the Openstack one. In this case you can always get access to this through API extension that Neutron exposes as well. In this sense you can gain additional way of interacting and using your new technology and still remain open and interoperable with feature versions.
It allows for vendor specific extension for more advance networking features that may not be present in current Neutron API.
It is the platform to integrate all network functionality in cloud. Although its main focus is on the most common cases and what users demand.
Openstack wants to be the the reference model for the next generation cloud data center and Neutron aims to provide support for the networking part. By providing a common and widely acceptable platform it opens and allows multi network vendors integration.
Every vendor can have its own differentiate factors. In the current state of networking industry it is impossible to have a single API in Openstack to address every use case needs. It looks like there will be always space for vendor extension in Neutron for these infrastructure providers or customer who demand more specific and unique features that are not present yet.
Exposes network abstraction to developers, operation and devops teams who don't need to worry about the implemented details.

Would you run vanilla Openstack Neutron in production

This question was asked at about 41.50. There were different answers.

If you want to run the vanilla Neutron configuration it is not recommend to run it in production without good knowledge of how all Openstack components work and interact together.
You can run Neutron with vendor plugin in production. By choosing a vendor plugin instead of the vanilla open source solution you get additional level of confidence and support.
Maybe for private cloud depending on feature requirements and scalability but no for public cloud and enterprise network deployments.

The choice between Neutron Openstack vanilla vs vendor plugin needs to be always analyzed individual per customer.

What level of (technical) support is required before and after implementation.
How much expertise do you have in your company.
What scalability do we talk about.
What application do you want to run.
What features do you require.

Other issues

There aren't many good troubleshooting tools.
It is difficult to see and track a specific VM to VM traffic.

Thursday, December 26, 2013

Pymongo code example to manipulate data

Important note: This article is in relation to online MongoDB course. For more information about the course and other posts describing its content please check my main page here: M101P: MongoDB for Developers course.

In the various files in week2 directory you are going to find extensive code examples how to use the pymongo module. Below is a quick and short summary of the most important functions. I hope all code is self explanatory.

query = {'type':'exam', 'score':{'$gt':50, '$lt':70}}
iter = scores.find(query)

query = {'type':'exam'}
selector = {'student_id':1, '_id':0}
iter = scores.find(query, selector)

query = {'student_id':10}
doc = scores.find_one(query)

counter = counters.find_and_modify(query={'type':name},
update={'$inc':{'value':1}},
upsert=True, new=True)

cursor = scores.find(query).limit(10).skip(30)

query = {'media.oembed.type':'video'}
projection = {'media.oembed.url':1, '_id':0}
iter = scores.find(query, projection)

doc = {"name":"Andrew Erlichson", "company":"10gen","interests":['running', 'cycling', 'photography']}
people.insert(doc)

things.update({'thing':'apple'}, {'$set':{'color':'red'}}, upsert=True)
things.update({'thing':'pear'}, {'color':'green'}, upsert=True)

scores.update({},{'$unset':{'review_date':1}},multi=True)

scores.find_one({'student_id':1, 'type':'homework'})
score['review_date'] = datetime.datetime.utcnow()
scores.save(score)

scores.update({'student_id':1, 'type':'homework'},
{'$set':{'review_date':datetime.datetime.utcnow()}})

cursor = cursor.sort([('student_id',pymongo.ASCENDING),('score',pymongo.DESCENDING)])

Wednesday, December 25, 2013

Howto pause bash loop execution and wait for any key

Problem

Howto pause bash loop execution and wait for any key (like ENTER for example) from user when running the script.

 
root@mongo2:~/tmp.loop# find -name '.bash*' | while read myfile; do echo "my variable is $myfile"; done
my variable is ./.bashrc
my variable is ./.bashrc_rado
my variable is ./.bash_history
my variable is ./.bash_tmp

Solution description and demonstration

We can use the standard 'read' bash built-in function (man bash). The problem is that it reads by default from the standard in (stdin, file descriptor 0).

We can use file redirection feature in bash to workaround this. Instead of reading from the stdin we can instruct the read to read from a different descriptor.

 
$echo -n 'ello' | ( read a; read -u1 b ; echo "1st read : - $a -"; echo "2th read : = $b =" )
test
1st read : - ello -
2th read : = test =

Unsuccessful version 1 showing the problem (read consumes our file names):

 
root@mongo2:~/tmp.loop# find -name '.bash*' | while read myfile; do echo "my variable is $myfile"; read ; done
my variable is ./.bashrc
my variable is ./.bash_history

Final solution (we press ENTER every time it pauses):

 
root@mongo2:~/tmp.loop# find -name '.bash*' | while read myfile; do echo "my variable is $myfile"; read -u1 ; done                     23:48
my variable is ./.bashrc

my variable is ./.bashrc_rado

my variable is ./.bash_history

my variable is ./.bash_tmp

References

http://www.catonmat.net/blog/bash-one-liners-explained-part-three/
http://www.tldp.org/LDP/abs/html/index.html
http://www.catonmat.net/download/bash-redirections-cheat-sheet.pdf

Print file in a beautiful way with its syntax highlighted on the Linux bash console

The Linux 'cat' program lacks the syntax recognition and highlighting features. If you want to print a file in a beautiful way with its syntax highlighted you can use our wrapper: scat.

It is a wrapper around the source-highlight tool that automatically recognizes the syntax and prints the file on stdout. The source code is taken from the github repo of Japh Woldrich: https://github.com/trapd00r/utils

Installation

 
cd $HOME
mkdir $HOME/tools
cd $HOME/tools
wget https://raw.github.com/trapd00r/utils/master/scat
chmod a+x scat

export PATH=$PATH:~/tools

Demonstration

How to find out how many documents were affected by the last instruction

Important note: This article is in relation to online MongoDB course. For more information about the course and other posts describing its content please check my main page here: M101P: MongoDB for Developers course.

getLastError attribute

On mongo shell the data modification operation not always generate any readable output to show what happened. The below trick can be used to:

Find last operation status (success or failure).
Find number of rows affected or changed ( the 'n' parameter )
To find out if a new document was inserted during insert-update
To find out how many documents were updated

 > db.people.insert( { _id : "top", var : 2} )
E11000 duplicate key error index: students.people.$_id_  dup key: { : "top" }
> db.runCommand( {getLastError : 1 } )
{
        "err" : "E11000 duplicate key error index: students.people.$_id_  dup key: { : \"top\" }",
        "code" : 11000,
        "n" : 0,
        "connectionId" : 1,
        "ok" : 1
}

---

> db.people.insert( { _id : "mark", var : 3} )
> db.runCommand( {getLastError : 1 } )
{ "n" : 0, "connectionId" : 1, "err" : null, "ok" : 1 }

---

> db.people.update( {var : { $lt:3 } } , { $set : { var2: 0 } }, {multi: true} )
> db.runCommand( {getLastError : 1 } )
{
        "updatedExisting" : true,
        "n" : 2,
        "connectionId" : 2,
        "err" : null,
        "ok" : 1
}
> db.people.find()
{ "_id" : "mark", "var" : 3 }
{ "_id" : "rado", "var" : 1, "var2" : 0 }
{ "_id" : "top", "var" : 2, "var2" : 0 }

---

# showing bad example with wrong order of functions! 

> db.people.update( {var : 4} , { $set : { var2: 0 } }, { upsert: true } )
> db.people.find()
{ "_id" : "mark", "var" : 3 }
{ "_id" : "rado", "var" : 1, "var2" : 0 }
{ "_id" : "top", "var" : 2, "var2" : 0 }
{ "_id" : ObjectId("52ba26f9c1207a4a1fdeb3ae"), "var" : 4, "var2" : 0 }
> db.runCommand( {getLastError : 1 } )
{ "n" : 0, "connectionId" : 2, "err" : null, "ok" : 1 }

---

# the same operation but with correct functions order 

> db.people.update( {var : 4} , { $set : { var2: 0 } }, { upsert: true } )
> db.runCommand( {getLastError : 1 } )
{
        "updatedExisting" : false,
        "upserted" : ObjectId("52ba2916c1207a4a1fdeb3b0"),
        "n" : 1,
        "connectionId" : 2,
        "err" : null,
        "ok" : 1
}
> db.people.find()
{ "_id" : "mark", "var" : 3 }
{ "_id" : "rado", "var" : 1, "var2" : 0 }
{ "_id" : "top", "var" : 2, "var2" : 0 }
{ "_id" : ObjectId("52ba2916c1207a4a1fdeb3b0"), "var" : 4, "var2" : 0 }

---

> db.people.update( {} , { $set : { var2: 1 } }, { multi: true } )
> db.runCommand( {getLastError : 1 } )
{
        "updatedExisting" : true,
        "n" : 4,
        "connectionId" : 2,
        "err" : null,
        "ok" : 1
}
> db.people.find()
{ "_id" : "rado", "var" : 1, "var2" : 1 }
{ "_id" : "top", "var" : 2, "var2" : 1 }
{ "_id" : ObjectId("52ba26f9c1207a4a1fdeb3ae"), "var" : 4, "var2" : 1 }
{ "_id" : "mark", "var" : 3, "var2" : 1 }

Updating _ID document attribute in MongoDB

Important note: This article is in relation to online MongoDB course. For more information about the course and other posts describing its content please check my main page here: M101P: MongoDB for Developers course.

The _id in MongoDB has a special meaning and is always present in every document. It is used as the primary key for the collection.

It doesn't have to be generated automatically. You can specify its value manually if you want.

 
> db.people.insert( { _id : "rado", var : 1} )
> db.people.find()
{ "_id" : "rado", "var" : 1 }

> db.people.insert( { _id : "top", var : 2} )
> db.people.find()
{ "_id" : "rado", "var" : 1 }
{ "_id" : "top", "var" : 2 }

> db.people.insert( { _id : "top", var : 2} )
E11000 duplicate key error index: students.people.$_id_  dup key: { : "top" }

> db.people.find()
{ "_id" : "rado", "var" : 1 }
{ "_id" : "top", "var" : 2 }

Removing data

 
> db.arrays.find()
{ "_id" : ObjectId("52ba1c78a83c1ee5e6c903a1"), "a" : 2, "tab" : [  11,  22,  "hello" ] }
{ "_id" : ObjectId("52ba1e4cc1207a4a1fdeb3ac"), "a" : 3, "tab" : { "2" : "london" } }
{ "_id" : ObjectId("52ba1e99c1207a4a1fdeb3ad"), "a" : 4, "mystr" : "poland" }
>
> db.arrays.remove( { a: 2})
> db.arrays.find()
{ "_id" : ObjectId("52ba1e4cc1207a4a1fdeb3ac"), "a" : 3, "tab" : { "2" : "london" } }
{ "_id" : ObjectId("52ba1e99c1207a4a1fdeb3ad"), "a" : 4, "mystr" : "poland" }

This removes all the documents in the collection at once. The drop method is more recommended because it drops the whole collection instead or removing document by document (more performant).

 
> db.arrays.remove()
> db.arrays.drop()
> db.arrays.find()

IMPORTANT: multi document operation are not atomic and the db engine execution thread after removing a few documents can yield the execution to another thread before returning and finishing its operation.

Update on multiple records inside collection

Important note: This article is in relation to online MongoDB course. For more information about the course and other posts describing its content please check my main page here: M101P: MongoDB for Developers course.

By default an update function with 2 arguments updates a SINGLE RANDOM (unspecified) document. To update multiple documents we need to provide the 3th option: multi.

The {} as first argument means match every document.

 
> db.arrays.find()
{ "_id" : ObjectId("52ba1c78a83c1ee5e6c903a1"), "a" : 1, "tab" : [  11,  22,  "hello" ] }
{ "_id" : ObjectId("52ba1e4cc1207a4a1fdeb3ac"), "a" : 2, "tab" : { "2" : "london" } }
{ "_id" : ObjectId("52ba1e99c1207a4a1fdeb3ad"), "a" : 3, "mystr" : "poland" }

> db.arrays.update( { }, { $inc : { "a" : 1 } }, { multi: true} )
>
> db.arrays.find()
{ "_id" : ObjectId("52ba1c78a83c1ee5e6c903a1"), "a" : 2, "tab" : [  11,  22,  "hello" ] }
{ "_id" : ObjectId("52ba1e4cc1207a4a1fdeb3ac"), "a" : 3, "tab" : { "2" : "london" } }
{ "_id" : ObjectId("52ba1e99c1207a4a1fdeb3ad"), "a" : 4, "mystr" : "poland" }

IMPORTANT: MongoDB doesn't offer isolated transactions across multiple documents. It grantees a single document update is atomic. No other concurrent updates/writes can modify the same document.

In regards to the test above it means that there is no grantee that all the updates will be executed in one go.

It is possible that the db engine thread that does the update will be stopped, de-scheduled or its execution will be yielded (will be give) to another db engine thread that modifies the same document collection (or its subset). In such a case the updates on the documents may happen in different orders and one can override the other. Only a single document update is atomic.

Insert and update operation

 
> db.arrays.find()
{ "_id" : ObjectId("52ba1c78a83c1ee5e6c903a1"), "a" : 1, "tab" : [  11,  22,  "hello" ] }

> db.arrays.update( { a: 2}, { $set : { "tab.2" : "london" } } )
> db.arrays.find()
{ "_id" : ObjectId("52ba1c78a83c1ee5e6c903a1"), "a" : 1, "tab" : [  11,  22,  "hello" ] }


> db.arrays.update( { a: 2}, { $set : { "tab.2" : "london" } }, { upsert: true} )
> db.arrays.find()
{ "_id" : ObjectId("52ba1c78a83c1ee5e6c903a1"), "a" : 1, "tab" : [  11,  22,  "hello" ] }
{ "_id" : ObjectId("52ba1e4cc1207a4a1fdeb3ac"), "a" : 2, "tab" : { "2" : "london" } }

> db.arrays.update( { a: 3}, { $set : { "mystr" : "poland" } }, { upsert: true} )
> db.arrays.find()
{ "_id" : ObjectId("52ba1c78a83c1ee5e6c903a1"), "a" : 1, "tab" : [  11,  22,  "hello" ] }
{ "_id" : ObjectId("52ba1e4cc1207a4a1fdeb3ac"), "a" : 2, "tab" : { "2" : "london" } }
{ "_id" : ObjectId("52ba1e99c1207a4a1fdeb3ad"), "a" : 3, "mystr" : "poland" }

Modifying documents in the data base

Important note: This article is in relation to online MongoDB course. For more information about the course and other posts describing its content please check my main page here: M101P: MongoDB for Developers course.

Document updates

The update syntax is similar to how find function works. The first argument is a search document, the second is the NEW DOCUMENT to be inserted to REPLACE the existing.

 
> db.grades.find( { "student_id" : 1})
{ "_id" : ObjectId("50906d7fa3c412bb040eb57c"), "student_id" : 1, "type" : "quiz", "score" : 96.76851542258362 }
{ "_id" : ObjectId("50906d7fa3c412bb040eb57d"), "student_id" : 1, "type" : "homework", "score" : 21.33260810416115 }
{ "_id" : ObjectId("50906d7fa3c412bb040eb57e"), "student_id" : 1, "type" : "homework", "score" : 44.31667452616328 }
>
> db.grades.update( { "student_id" : 1}, { myvar : "new doc2"} )
>

> db.grades.find( { $or : [ { "student_id" : 1 }, { myvar : "new doc2"} ] } )
{ "_id" : ObjectId("50906d7fa3c412bb040eb57c"), "myvar" : "new doc2" }
{ "_id" : ObjectId("50906d7fa3c412bb040eb57d"), "student_id" : 1, "type" : "homework", "score" : 21.33260810416115 }
{ "_id" : ObjectId("50906d7fa3c412bb040eb57e"), "student_id" : 1, "type" : "homework", "score" : 44.31667452616328 }

Adding or modifying new attribute in the document

 
> db.grades.find( { myvar : "new doc2"} )
{ "_id" : ObjectId("50906d7fa3c412bb040eb57c"), "myvar" : "new doc2" }
>
> db.grades.update( { myvar : "new doc2"}, { $set : { newvar : 1} } )
>
> db.grades.find( { myvar : "new doc2"} )
{ "_id" : ObjectId("50906d7fa3c412bb040eb57c"), "myvar" : "new doc2", "newvar" : 1 }

You can increase an existing attribute or add a new one if it doesn't.

 
> db.grades.update( { myvar : "new doc2"}, { $inc : { newvar : 100} } )
>
> db.grades.find( { myvar : "new doc2"} )
{ "_id" : ObjectId("50906d7fa3c412bb040eb57c"), "myvar" : "new doc2", "newvar" : 101 }
>
> db.grades.update( { myvar : "new doc2"}, { $inc : { newnumber : 2} } )
>
> db.grades.find( { myvar : "new doc2"} )
{ "_id" : ObjectId("50906d7fa3c412bb040eb57c"), "myvar" : "new doc2", "newnumber" : 2, "newvar" : 101 }

As MongoDB don't have a strict db schema we can dynamically remove an attribute as well.

 
> db.grades.update( { myvar : "new doc2"}, { $unset : { newnumber : 2} } )
> db.grades.find( { myvar : "new doc2"} )
{ "_id" : ObjectId("50906d7fa3c412bb040eb57c"), "myvar" : "new doc2", "newvar" : 101 }

If a value is an array you can update a individual element of the array in the document.

 
> db.arrays.insert( { a : 1, tab : [ 11,22,33] } )
> db.arrays.find()
{ "_id" : ObjectId("52ba1c78a83c1ee5e6c903a1"), "a" : 1, "tab" : [  11,  22,  33 ] }
>

> db.arrays.update( { a: 1}, { $set : { "tab.2" : "hello" } } )
> db.arrays.find()
{ "_id" : ObjectId("52ba1c78a83c1ee5e6c903a1"), "a" : 1, "tab" : [  11,  22,  "hello" ] }

There are other methods you can use. A complete list of them can be found at http://docs.mongodb.org/manual/reference/operator/update-array

Quick summary of RDMS and NoSQL MongoDB

Regardless is this is RDMS or NoSQL data base engine the very basic purpose of these systems is to allow data collection and modification. This can be visualized :

References

http://en.wikipedia.org/wiki/NoSQL

How to count documents in collection

 
> db.grades.count()
809

> c = db.grades.find(); c.limit(3)
{ "_id" : ObjectId("50906d7fa3c412bb040eb577"), "student_id" : 0, "type" : "exam", "score" : 54.6535436362647 }
{ "_id" : ObjectId("50906d7fa3c412bb040eb578"), "student_id" : 0, "type" : "quiz", "score" : 31.95004496742112 }
{ "_id" : ObjectId("50906d7fa3c412bb040eb579"), "student_id" : 0, "type" : "homework", "score" : 14.8504576811645 }

> c.count()
809

Tuesday, December 24, 2013

Cursors in MongoDB and mongo shell Javascrip API

test Important note: This article is in relation to online MongoDB course. For more information about the course and other posts describing its content please check my main page here: M101P: MongoDB for Developers course.

The mongo shell support cursors. There are many functions available with this object.

 
> mycursor = db.grades.find(); null

> mycursor.
mycursor.addOption(             mycursor.forEach(               mycursor.max(                   mycursor.showDiskLoc(
mycursor.arrayAccess(           mycursor.hasNext(               mycursor.min(                   mycursor.size(
mycursor.batchSize(             mycursor.hasOwnProperty(        mycursor.next(                  mycursor.skip(
mycursor.clone(                 mycursor.help(                  mycursor.objsLeftInBatch(       mycursor.snapshot(
mycursor.comment(               mycursor.hint(                  mycursor.pretty(                mycursor.sort(
mycursor.constructor            mycursor.itcount(               mycursor.propertyIsEnumerable(  mycursor.toArray(
mycursor.count(                 mycursor.length(                mycursor.readOnly(              mycursor.toLocaleString(
mycursor.countReturn(           mycursor.limit(                 mycursor.readPref(              mycursor.toString(
mycursor.explain(               mycursor.map(                   mycursor.shellPrint(            mycursor.valueOf(

> mycursor.hasNext()
true

By using the mongo shell flexibility you can run JavaScript code. The simple program below retrieves and prints values from the collection as demonstration:

 
> for ( var i =1 ; i<5 && mycursor.hasNext(); i++ ) { print("for iteration # " + i); printjson(mycursor.next()) }
for iteration # 1
{
        "_id" : ObjectId("50906d7fa3c412bb040eb5b4"),
        "student_id" : 15,
        "type" : "quiz",
        "score" : 33.87245622400884
}
for iteration # 2
{
        "_id" : ObjectId("50906d7fa3c412bb040eb5b5"),
        "student_id" : 15,
        "type" : "homework",
        "score" : 18.41724574382455
}
for iteration # 3
{
        "_id" : ObjectId("50906d7fa3c412bb040eb5b6"),
        "student_id" : 15,
        "type" : "homework",
        "score" : 7.475648374118382
}
for iteration # 4
{
        "_id" : ObjectId("50906d7fa3c412bb040eb5b7"),
        "student_id" : 16,
        "type" : "exam",
        "score" : 40.92812784954744
}

There is a way to limit the number or documents to read as well.

 
> mycursor = db.grades.find(); null
null
> mycursor.limit(5); null
null
> while ( mycursor.hasNext()) { printjson(mycursor.next()) }

We can apply a sort function to the cursor as well. The code shows how to list documents in the revers order.

 
> mycursor = db.grades.find(); null
null
> mycursor.limit(5); null
null
> mycursor.sort( { student_id : -1 } )
{ "_id" : ObjectId("50906d7fa3c412bb040eb893"), "student_id" : 199, "type" : "exam", "score" : 67.33828604577803 }
{ "_id" : ObjectId("50906d7fa3c412bb040eb894"), "student_id" : 199, "type" : "quiz", "score" : 48.15737364405101 }
{ "_id" : ObjectId("50906d7fa3c412bb040eb895"), "student_id" : 199, "type" : "homework", "score" : 49.34223066136407 }
{ "_id" : ObjectId("50906d7fa3c412bb040eb896"), "student_id" : 199, "type" : "homework", "score" : 58.09608083191365 }
{ "_id" : ObjectId("50906d7fa3c412bb040eb88f"), "student_id" : 198, "type" : "exam", "score" : 49.65504121659061 }

In the code above this is happening:

Create a cursor - no data has been received from the db at this time; merely we have established a db connection with the collection.
We say we want to see only 5 documents.
When we execute the sort method it fetches data from db, iterates over the cursor and print 5 documents (notice the missing null at the end)

And even further we can concatenate the two functions together on a single line.

 
> mycursor.sort( { student_id : -1 } ).limit(5)
{ "_id" : ObjectId("50906d7fa3c412bb040eb893"), "student_id" : 199, "type" : "exam", "score" : 67.33828604577803 }
{ "_id" : ObjectId("50906d7fa3c412bb040eb894"), "student_id" : 199, "type" : "quiz", "score" : 48.15737364405101 }
{ "_id" : ObjectId("50906d7fa3c412bb040eb895"), "student_id" : 199, "type" : "homework", "score" : 49.34223066136407 }
{ "_id" : ObjectId("50906d7fa3c412bb040eb896"), "student_id" : 199, "type" : "homework", "score" : 58.09608083191365 }
{ "_id" : ObjectId("50906d7fa3c412bb040eb88f"), "student_id" : 198, "type" : "exam", "score" : 49.65504121659061 }

The take away from this is examples is to note that all the function applied to the cursor are not processed on the client site by the mango shell but instead these are instruction sent to the MongoDB engine once the cursor is created and being processed by the engine once the cursor request data to be returned.

The cursor is an interface to sent the parameters to the db engine and to read the returned data.

Nested documents and the dot notation in find

Important note: This article is in relation to online MongoDB course. For more information about the course and other posts describing its content please check my main page here: M101P: MongoDB for Developers course.

Nested documents

We can have a variable that represent another document with completely separate attributes and values.

 
> db.embeded.insert( {rado1:1, nestedDoc : { var1 : 1, str : "test"  } } )

> db.embeded.find()
{ "_id" : ObjectId("52b8e2aa8a25e81047ccb8c9"), "rado1" : 1, "nestedDoc" : { "var1" : 1, "str" : "test" } }
> db.embeded.find().pretty()
{
        "_id" : ObjectId("52b8e2aa8a25e81047ccb8c9"),
        "rado1" : 1,
        "nestedDoc" : {
                "var1" : 1,
                "str" : "test"
        }
}

Searching for embedded variables

This is the find criteria for the most outer document variable in the standard way we were doing so far.

 
> db.embeded.find( {rado1 : 1} )
{ "_id" : ObjectId("52b8e2aa8a25e81047ccb8c9"), "rado1" : 1, "nestedDoc" : { "var1" : 1, "str" : "test" } }

Taking this analogy this search unfortunately doesn't produce the expected result.

 
> db.embeded.find( {nestedDoc : { var1 :1 } } )
>

If you need to provide the argument for the find method that is an embedded documents you need to provide the exact variables it is built of. Pay attention to the order because if you mix it this will not work again. It doesn't work because MongoDB is looking for a variable with exactly this schema/syntax specified.

 
> db.embeded.find( {nestedDoc : { var1 :1, str: "test" } } )
{ "_id" : ObjectId("52b8e2aa8a25e81047ccb8c9"), "rado1" : 1, "nestedDoc" : { "var1" : 1, "str" : "test" } }
> db.embeded.find( {nestedDoc : { str: "test", var1 :1 } } )

Dot notation

To search for the embedded values we can use the "dot" notation.

 
> db.embeded.find( { "nestedDoc.var1" : 1 } )
{ "_id" : ObjectId("52b8e2aa8a25e81047ccb8c9"), "rado1" : 1, "nestedDoc" : { "var1" : 1, "str" : "test" } }

Monday, December 23, 2013

Document attribute that is an array of strings

Important note: This article is in relation to online MongoDB course. For more information about the course and other posts describing its content please check my main page here: M101P: MongoDB for Developers course.

Arrays

When a document attribute is an array MongoDB searches through the array values when you use the find() function. This can be odd if you are not aware of such behavior. Even though this feature is important to note that there is not support for recursive searches on array values as such.

If the attribute value is an array the below query will perform the matching against the array elements as well.

 
> db.grades.find( { "test" : 2,  "test var" : "london" } )
{ "_id" : ObjectId("52b78121e7a1580719615d4f"), "test" : 2, "test var" : [  "munich",  "london" ] }
{ "_id" : ObjectId("52b78125e7a1580719615d50"), "test" : 2, "test var" : "london" }

For more advance searches you can the $all and $or with arrays.
The $all operator means that your attribute array needs to have all the specified values.

 
> db.grades.find( { "test" : 2,  "test var" : { $all : [ "london" , "munich"]  } } )
{ "_id" : ObjectId("52b78121e7a1580719615d4f"), "test" : 2, "test var" : [  "munich",  "london" ] }

Where the $in operator means that your document array attribute needs to have at least one of the specified strings.

 
> db.grades.find( { "test" : 2,  "test var" : { $in : [ "london" , "munich"]  } } )
{ "_id" : ObjectId("52b78121e7a1580719615d4f"), "test" : 2, "test var" : [  "munich",  "london" ] }
{ "_id" : ObjectId("52b78125e7a1580719615d50"), "test" : 2, "test var" : "london" }

The same document attribute can have different type in a single collection

Important note: This article is in relation to online MongoDB course. For more information about the course and other posts describing its content please check my main page here: M101P: MongoDB for Developers course.

It is possible that multiple documents in a single collection have the same attribute of different type. In the example below the test attribute has a value number 1 on the first document and a string value in the second.

 
> db.grades.insert( {"test" : 1, "test var" : 1 } )
> db.grades.insert( {"test" : 1, "test var" : "string one" } )
>
> db.grades.find( { "test" : 1 })
{ "_id" : ObjectId("52b77870e7a1580719615d4a"), "test" : 1, "test var" : 1 }
{ "_id" : ObjectId("52b77877e7a1580719615d4b"), "test" : 1, "test var" : "string one" }

Pretty printing

The default formatting can be change to improve readability of some documents when applying the pretty() function.

 
> db.grades.find( { _id: ObjectId("50906d7fa3c412bb040eb577" ) } )
{ "_id" : ObjectId("50906d7fa3c412bb040eb577"), "student_id" : 0, "type" : "exam", "score" : 54.6535436362647 }
>
> db.grades.find( { _id: ObjectId("50906d7fa3c412bb040eb577" ) } ).pretty()
{
        "_id" : ObjectId("50906d7fa3c412bb040eb577"),
        "student_id" : 0,
        "type" : "exam",
        "score" : 54.6535436362647
}

More advance search queries

 
> db.grades.find( {"student_id" : { $gt: 0, $lt:3 }, "type" : "quiz"} )
{ "_id" : ObjectId("50906d7fa3c412bb040eb57c"), "student_id" : 1, "type" : "quiz", "score" : 96.76851542258362 }
{ "_id" : ObjectId("50906d7fa3c412bb040eb580"), "student_id" : 2, "type" : "quiz", "score" : 1.528220212203968 }

We use the exists operator when we want to retrieve documents with specific attribute regardless of its value or type. The only condition is that this attribute needs to exists in the document.

 
> db.grades.insert( {"test" : 1, "test var" : "string one", "other var" : 2 } )
> db.grades.find( { "other var" : {$exists : true}  })
{ "_id" : ObjectId("52b7791fe7a1580719615d4c"), "test" : 1, "test var" : "string one", "other var" : 2 }

> db.grades.find( { "test var" : {$exists : true} })
{ "_id" : ObjectId("52b77870e7a1580719615d4a"), "test" : 1, "test var" : 1 }
{ "_id" : ObjectId("52b77877e7a1580719615d4b"), "test" : 1, "test var" : "string one" }
{ "_id" : ObjectId("52b7791fe7a1580719615d4c"), "test" : 1, "test var" : "string one", "other var" : 2 }

Finding documents with attribute of a particular type.

 
> db.grades.find( { "test var" : {$type : 1} })
{ "_id" : ObjectId("52b77870e7a1580719615d4a"), "test" : 1, "test var" : 1 }

> db.grades.find( { "test var" : {$type : 2} })
{ "_id" : ObjectId("52b77877e7a1580719615d4b"), "test" : 1, "test var" : "string one" }
{ "_id" : ObjectId("52b7791fe7a1580719615d4c"), "test" : 1, "test var" : "string one", "other var" : 2 }

You can use as well as regular expression when matching document attributes.

 
> db.grades.find( { "test var" : {$regex : "^s.*one$" }}  )
{ "_id" : ObjectId("52b77877e7a1580719615d4b"), "test" : 1, "test var" : "string one" }
{ "_id" : ObjectId("52b7791fe7a1580719615d4c"), "test" : 1, "test var" : "string one", "other var" : 2 }

If you want to provide 2 separate searching criterias you can use the $or operator.

 
> db.grades.find( { $or : [{ "test var" : {$type :1 }}, {"test var" : {$regex : "^s.*one$" }}] } )
{ "_id" : ObjectId("52b77870e7a1580719615d4a"), "test" : 1, "test var" : 1 }
{ "_id" : ObjectId("52b77877e7a1580719615d4b"), "test" : 1, "test var" : "string one" }
{ "_id" : ObjectId("52b7791fe7a1580719615d4c"), "test" : 1, "test var" : "string one", "other var" : 2 }

In a similar way the $and will help you to write queries where a single attribute needs to meet multiple criterias.

 
> db.grades.find( { $and : [ {"test var" : {$regex : "^s" }}, { "test var" : { $regex : "one$"} }] } )
{ "_id" : ObjectId("52b77877e7a1580719615d4b"), "test" : 1, "test var" : "string one" }
{ "_id" : ObjectId("52b7791fe7a1580719615d4c"), "test" : 1, "test var" : "string one", "other var" : 2 }

The $and queries can be further optimized and rewrite to use simpler syntax.

 
> db.grades.find( { "test var" : {$regex : "^s", $type : 1} } )
>
> db.grades.find( { "test var" : {$regex : "^s", $type : 2} } )
{ "_id" : ObjectId("52b77877e7a1580719615d4b"), "test" : 1, "test var" : "string one" }
{ "_id" : ObjectId("52b7791fe7a1580719615d4c"), "test" : 1, "test var" : "string one", "other var" : 2 }

It is important how do you write the search criterias above. If you write them separately you will create different query. The attribute criteria may be overwritten and only the last one will be used.

 
> db.grades.find( { "test var" : {$regex : "^s" }, "test var" : { $type : 1} } )
{ "_id" : ObjectId("52b77870e7a1580719615d4a"), "test" : 1, "test var" : 1 }

> db.grades.find( { "test var" : {$regex : "^rrrrrr" }, "test var" : { $type : 2} } )
{ "_id" : ObjectId("52b77877e7a1580719615d4b"), "test" : 1, "test var" : "string one" }
{ "_id" : ObjectId("52b7791fe7a1580719615d4c"), "test" : 1, "test var" : "string one", "other var" : 2 }

How to search for documents in collection

Important note: This article is in relation to online MongoDB course. For more information about the course and other posts describing its content please check my main page here: M101P: MongoDB for Developers course.

In RDMS to look for what data are stored in a table we would use SELECT. In MongoDB we are going to use find method instead.

Search method

For the search function you provide 2 documents as arguments.
The 1st arg is what you are looking for.
The 2th what values do you want to retrieve.

 
> db.funnynumbers.insert( { rado: 1, a:100, b:200 , c:300} )
>
> db.funnynumbers.find( {a:100} )
{ "_id" : ObjectId("52b252ff95aed72de4e6ed39"), "rado" : 1, "a" : 100, "b" : 200, "c" : 300 }
>
> db.funnynumbers.find( {a:100}, {rado:true} )
{ "_id" : ObjectId("52b252ff95aed72de4e6ed39"), "rado" : 1 }
> db.funnynumbers.find( {a:100}, {rado:true, _id : false} )
{ "rado" : 1 }

Another example.

 
> db.grades.find()
{ "_id" : ObjectId("50906d7fa3c412bb040eb577"), "student_id" : 0, "type" : "exam", "score" : 54.6535436362647 }
{ "_id" : ObjectId("50906d7fa3c412bb040eb578"), "student_id" : 0, "type" : "quiz", "score" : 31.95004496742112 }
{ "_id" : ObjectId("50906d7fa3c412bb040eb579"), "student_id" : 0, "type" : "homework", "score" : 14.8504576811645 }
{ "_id" : ObjectId("50906d7fa3c412bb040eb57a"), "student_id" : 0, "type" : "homework", "score" : 63.98402553675503 }
{ "_id" : ObjectId("50906d7fa3c412bb040eb57b"), "student_id" : 1, "type" : "exam", "score" : 74.20010837299897 }
{ "_id" : ObjectId("50906d7fa3c412bb040eb57c"), "student_id" : 1, "type" : "quiz", "score" : 96.76851542258362 }

You can specify multiple search criterias by specifying multiple document attributes.

 
> db.grades.find( {"student_id" : 0} )
{ "_id" : ObjectId("50906d7fa3c412bb040eb577"), "student_id" : 0, "type" : "exam", "score" : 54.6535436362647 }
{ "_id" : ObjectId("50906d7fa3c412bb040eb578"), "student_id" : 0, "type" : "quiz", "score" : 31.95004496742112 }
{ "_id" : ObjectId("50906d7fa3c412bb040eb579"), "student_id" : 0, "type" : "homework", "score" : 14.8504576811645 }
{ "_id" : ObjectId("50906d7fa3c412bb040eb57a"), "student_id" : 0, "type" : "homework", "score" : 63.98402553675503 }

> db.grades.find( {"student_id" : 0, "type" : "quiz"} )
{ "_id" : ObjectId("50906d7fa3c412bb040eb578"), "student_id" : 0, "type" : "quiz", "score" : 31.95004496742112 }

Mongo shell

Important note: This article is in relation to online MongoDB course. For more information about the course and other posts describing its content please check my main page here: M101P: MongoDB for Developers course.

The interactive shell is called mongo. It is JavaScript interpreter. Everything it interprets is a JavaScript code.

Default variables

 
$ mongo
MongoDB shell version: 2.4.8
connecting to: test
> db
test
> show dbs
local   0.078125GB
m101    0.203125GB
test    (empty)
> use m101
switched to db m101
> db
m101

You can even run some JavaScript code to implement more sophisticated behavior wehn interacting with data base.

 
> for ( var i =1 ; i<5; i++ ) { print(i) }
1
2
3
4

_ID document attribute

Every retrieved document from database has an attribute '_id'.

 
> db.funnynumbers.findOne()
{ "_id" : ObjectId("50778ce69331a280cf4bcf7d"), "value" : 87 }

It is used by the MongoDB engine internally. It is used as a PRIMARY KEY for collection for example.

The _id is immutable. To change it you can remove and insert a document again into db.

The _id is dynamically calculated when you insert documents into db. To grantee that there are no collision its value will be based on current time and process id.

 
> db.funnynumbers.findOne()
{ "_id" : ObjectId("50778ce69331a280cf4bcf7d"), "value" : 87 }


> db.funnynumbers.find( {'rado' : 1} )
{ "_id" : ObjectId("52b251f995aed72de4e6ed38"), "rado" : 1 }

JavaScript script to populate the data base

In the week2 we can find a JavaScript code that uses the power and flexibility of the mongo shell to creates the collection and insert documents:

https://github.com/rtomaszewski/mongo-course/blob/master/M101P/week2/create_student_collection.d59b66847ae9.js

 
~/mongo-course/M101P/week2# mongo < create_student_collection.d59b66847ae9.js
MongoDB shell version: 2.4.8
connecting to: test
switched to db school
bye

References

http://docs.mongodb.org/manual/reference/mongo-shell/

MongoDB supported features

Is a document oriented data base.
Has a dynamic schema (db schema can be created dynamically when you run quires, insert or updates documents in collection).
Document access is guaranteed to be atomic. Supports atomic operation on a single document. There is no support for transaction on multiple documents or documents across multiple collections.
It supports Indexes and Secondary Indexes.
Doesn't use SQL language like the one use for RDMS. Instead it uses an API like language for data manipulation and quires.
Interestingly the mongo shell is an interactive JavaScript interpreter.
It uses JSON as a document format when interacting with external world.
Documents and data base data are stored in a binary BSON format on the disk by the engine although.
Data normalization is not a primary goal for document stored in MongoDB. Instead of trying to normalize all data stored in collection in Mongo you are advise to use embedding for 1-1, 1-many, many-1 and many-many relationships. Of course if this is not aligned with your application data access path you can still normalize data and use the _ID as a reference across tables. But this will be client site to enforce this to keep the data consistent.
Support single and multi key indexes.
A single file you can save can be up to 16MB in size. To store larger files you can use gridfs.

What is not supported in MongoDB

Doesn't' supports joins.
Doesn't support RDMS SQL language.
Doesn't support transactions across multiple collections.
Doesn't support transactions across multiple documents.
Doesn't natively support constrains (example are foreign keys in RDMS tables). If you need a mechanism like this you would need to enforce it on the client site. Often you can lower the requirements for this feature when using a smart documents schema with embedded documents.

Thursday, December 19, 2013

Exercise 2 and 3 - the order of functions

Important note: This article is in relation to online MongoDB course. For more information about the course and other posts describing its content please check my main page here: M101P: MongoDB for Developers course.

Importance and order of functions in MongoDB when retrieving data

When running week 1 homework scripts we get following results:

The code is simple. There is only one line of code that is interesting.

iter = collection.find({},limit=1, skip=n).sort('value', direction=1)

There are two methods executed one after another. The first one is find and the second sort. At fist it looks like the sort method is only applied to the results returned by find but this is not truth.

What is actually happening is:

The find method returns all documents from the data base (the order of the values is determined by the database what can be read as random; this would be the equivalent of a select * from table statement in SQL)
The list is then sorted base on the value. The value represent here one of the document attributes.
After the returned documents (i.e. returned rows from table in RDMS data base) are sorted we skip the first 'n' rows and return a single element.

My M101P: MongoDB for Developers course

Below are some of my notes from the MongoDB course i did. This list will grow as the course progresses.

Online MongoDb Course

The course video can be found here:
https://education.mongodb.com/courses

Source code and further example code
https://github.com/rtomaszewski/mongo-course

A quick function summary of the mongo shell and API
http://www.hodgin.ca/downloads/Mongo%20Shell%20Cheat%20Sheet.pdf

Shell documentation
http://docs.mongodb.org/manual/reference/method/
http://docs.mongodb.org/manual/reference/operator/

Pymongo docs
http://api.mongodb.org/python/current/
https://pypi.python.org/pypi/pymongo/

Week #1

Quick summary of RDMS and NoSQL MongoDB
You can start experimenting with the data base by installing it first: Simple MongoDB tutorial. For the course you will need to install the latest version of the db although.
Connect pymongo to bottle framework
Exercise 2 and 3 - the order of functions

Week #2

Week #3

Connect pymongo to bottle framework

All course files can be downloaded from https://github.com/rtomaszewski/mongo-course
Create a new cloud server
Install the software

Testing

References

http://docs.mongodb.org/manual/reference/method/db.collection.find/#db.collection.find

Tuesday, December 17, 2013

How to estimate hardware requirements for your private Openstack deployment

I've found this little tool on the Mirantis web page: http://www.mirantis.com/openstack-services/bom-calculator/. You can play with it to estimate how much hardware (and money as well :)) you need to build your own private Openstack cloud infrastructure. It gives as well as an overview about potential vendors and theirs hardware.

Simple example

Top 10 data center network vendor list

In previous post was saw how handful the Gartner magic quadrants can be when learning and researching particular technology or industry trend:

Below is an another comparison chart, this time targeting vendor network equipment in data centers.

By the way I was investigating this because of the networks vendor selection on this cloud hardware deployment calculator (How to estimate hardware requirements for your private Openstack deployment ) : Dell, Cisco, HP, Arista, Juniper, Brocade.

Sunday, December 15, 2013

ASCII diagram tells more than a full page of text

Before anyone can help you they need to understand you first. Below is an example why a single picture is worth more than than full page of text.

Problem

How long would it take to explain to somebody that:

VLAN1 is used ONLY for inter VM communication
VMs in VLAN2 and 3 are in isolated network segments that have internet access
VMs in VLAN2 and 3 can talk to each other (using the FW as its default gateway to route traffic)

Solution

ASCII diagram:
                                               +---+------------+
                                               |   |            |
          ^                                    | v |------------|
          | internet                           | m |  vm2       |
          |                         vlan1      | 1 |------------|
      +---+---+         +-------+   vlan2      |   |  vm3       |
      | FW    | <-------|switch1| ------------ |----------------|
      +-------+   vlan3 +-------+   vlan3      |virt1 |virt2 |  |
                  vlan2     ^                  |switch|switch|  |
                            |                  |------+------+--|
                            |                  | Hypervisor1    |
                            | trunk            +----------------+
                            | vlan1
                            |
                            |
                            +
                       +-----------+  vlan1            
                       |aggregation|  vlan2    +-------------+
                       |-----------+<--------- | vm1 vm2     |
                       | switch    |  trunk    |-------------|
                       +-----------+           | Hypervisor2 |
                                               +-------------+

References

http://www.asciiflow.com/#Draw
http://www.streamwave.com/web-development/ascii-diagram-tools/
http://nedbatchelder.com/blog/200911/ditaa_diagrams_through_ascii_art.html

Saturday, December 14, 2013

What does Devops mean and how to use it

We wrote before about the challenges a company will likely face when implementing devops. We provided a number of Devops good practices together with anti-pattern of how do and not to do things. We locked at the language people use when talking about Devops here and saw a practical example base on Rackspace Openstack to what challenges and problems the implementation can lead to here. Below is a nice video that help even better understand it. Happy devops watching ;).

DevOps movie

Interesting slides from the movie

References

Direct link to the video on YouTube http://www.youtube.com/watch?v=_I94-tJlovg&feature=em-uploademail
More Rackspace videos http://www.youtube.com/user/RackspaceHosting?feature=watch

Rackspace resources
http://developer.rackspace.com/blog/
http://www.rackspace.com/devops/
http://www.rackspace.com/blog/tag/devops/
http://www.rackspace.com/blog/accelerate-devops-adoption-for-faster-innovation/

Internet and technology trends on the world

The world is changing. The economies in every countries are different. Below is an interesting presentation that tries to look at these phenomenons from high ground without going too much into details.

The full presentation

KPCB Internet Trends 2013 from Kleiner Perkins Caufield & Byers

Interesting slides