Showing posts from December, 2011

Install a Hama cluster using Whirr

Apache Whirr provides a Cloud-neutral way to run a properly-configured system quickly through libraries, common service API, smart defaults, and command line tool. Currently it supports various Cloud services e.g., Hadoop, HBase, Hama, Cassandra, and ZooKeeper. Let's see how it is simple to install Hama cluster using Whirr.

The following commands install Whirr and start a 5 node Hama cluster on Amazon EC2 in 5 minutes or less.

% curl -O % tar zxf whirr-0.7.0.tar.gz; cd whirr-0.7.0 % export AWS_ACCESS_KEY_ID=YOUR_ID % export AWS_SECRET_ACCESS_KEY=YOUR_SECKEY % ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa_whirr % bin/whirr launch-cluster --config recipes/ --private -key-file ~/.ssh/id_rsa_whirr
Upon success you should see imok echoed to the console, indicating that Hama is running.

Oh... finished. :)
Now you can run an BSP examples as below:


SSSP (Single Source Shortest Path) problem with Apache Hama

From yesterday I'm testing Apache Hama SSSP (Single Source Shortest Path) example with random graph of ~ 100 million vertices and ~ 1 billion edges as a input on my small cluster. More specifically:
Experimental environmentsOne rack (16 nodes 256 cores) cluster Hadoop 0.20.2Hama TRUNK r1213634.10G networkTask and data partitioningBased on hashing of vertextID in graph and size of input data.SSSP algorithmAlgorithm described in Pregel paperAnd here's rough results for you:

Vertices (x10 edges)TasksSuperstepsJob Execution Time10 million65423656.393 seconds20 million122231449.542 seconds30 million184398886.845 seconds40 million2454321112.912 seconds50 million30107472079.262 seconds60 million3681581754.935 seconds70 million42206344325.141 seconds80 million48143563236.194 seconds90 million54114802785.996 seconds100 million6076792169.528 seconds
What do you think on this chart? I'm quite satisfied considering that the job execution time contains the data partitioning and loading t…