Showing posts from December, 2011

Install a Hama cluster using Whirr

Apache Whirr provides a Cloud-neutral way to run a properly-configured system quickly through libraries, common service API, smart defaults, and command line tool. Currently it supports various Cloud services e.g., Hadoop, HBase, Hama, Cassandra, and ZooKeeper. Let's see how it is simple to install Hama cluster using Whirr. The following commands install Whirr and start a 5 node Hama cluster on Amazon EC2 in 5 minutes or less. % curl -O % tar zxf whirr-0.7.0.tar.gz; cd whirr-0.7.0 % export AWS_ACCESS_KEY_ID=YOUR_ID % export AWS_SECRET_ACCESS_KEY=YOUR_SECKEY % ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa_whirr % bin/whirr launch-cluster --config recipes/ --private -key-file ~/.ssh/id_rsa_whirr Upon success you should see imok echoed to the console, indicating that Hama is running. Oh... finished. :) Now you can run an BSP examples as below: edward@domU-12-31-39-0C-7D-41:/usr/local/hama-0

SSSP (Single Source Shortest Path) problem with Apache Hama

From yesterday I'm testing Apache Hama SSSP (Single Source Shortest Path) example with random graph of ~ 100 million vertices and ~ 1 billion edges as a input on my small cluster. More specifically: Experimental environments One rack (16 nodes 256 cores) cluster  Hadoop 0.20.2 Hama TRUNK r1213634. 10G network Task and data partitioning Based on hashing of vertextID in graph and size of input data. SSSP algorithm Algorithm described in Pregel paper And here's rough results for you: Vertices (x10 edges) Tasks Supersteps Job Execution Time 10 million 6 5423 656.393 seconds 20 million 12 2231 449.542 seconds 30 million 18 4398 886.845 seconds 40 million 24 5432 1112.912 seconds 50 million 30 10747 2079.262 seconds 60 million 36 8158 1754.935 seconds 70 million 42 20634 4325.141 seconds 80 million 48 14356 3236.194 seconds 90 million 54 11480 2785.996 seconds 100 million 60 7679 2169.528 seconds What do you think on this chart? I'm quite satisfied consi