Edward J. Yoon's Blog: 12/2011

Apache Whirr provides a Cloud-neutral way to run a properly-configured system quickly through libraries, common service API, smart defaults, and command line tool. Currently it supports various Cloud services e.g., Hadoop, HBase, Hama, Cassandra, and ZooKeeper. Let's see how it is simple to install Hama cluster using Whirr.

The following commands install Whirr and start a 5 node Hama cluster on Amazon EC2 in 5 minutes or less.

% curl -O http://apache.tt.co.kr//whirr/whirr-0.7.0/whirr-0.7.0.tar.gz
% tar zxf whirr-0.7.0.tar.gz; cd whirr-0.7.0

% export AWS_ACCESS_KEY_ID=YOUR_ID
% export AWS_SECRET_ACCESS_KEY=YOUR_SECKEY
% ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa_whirr

% bin/whirr launch-cluster --config recipes/hama-ec2.properties --private -key-file ~/.ssh/id_rsa_whirr

Upon success you should see imok echoed to the console, indicating that Hama is running.

Oh... finished. :)
Now you can run an BSP examples as below:

edward@domU-12-31-39-0C-7D-41:/usr/local/hama-0.3.0-incubating$ bin/hama jar hama-examples-0.3.0-incubating.jar 
An example program must be given as the first argument.
Valid program names are:
  bench: Random Communication Benchmark
  pagerank: PageRank
  pi: Pi Estimator
  sssp: Single Source Shortest Path
  test: Serialize Printing Test
edward@domU-12-31-39-0C-7D-41:/usr/local/hama-0.3.0-incubating$ bin/hama jar hama-examples-0.3.0-incubating.jar pi
11/12/25 11:48:11 INFO bsp.BSPJobClient: Running job: job_201112251143_0001
11/12/25 11:48:14 INFO bsp.BSPJobClient: Current supersteps number: 0
11/12/25 11:48:17 INFO bsp.BSPJobClient: Current supersteps number: 1
11/12/25 11:48:20 INFO bsp.BSPJobClient: The total number of supersteps: 1
Estimated value of PI is 3.147866666666667
Job Finished in 9.635 seconds

From yesterday I'm testing Apache Hama SSSP (Single Source Shortest Path) example with random graph of ~ 100 million vertices and ~ 1 billion edges as a input on my small cluster. More specifically:

Experimental environments

One rack (16 nodes 256 cores) cluster
Hadoop 0.20.2
Hama TRUNK r1213634.
10G network

Task and data partitioning

Based on hashing of vertextID in graph and size of input data.

SSSP algorithm

Algorithm described in Pregel paper

And here's rough results for you:

Vertices (x10 edges)	Tasks	Supersteps	Job Execution Time
10 million	6	5423	656.393 seconds
20 million	12	2231	449.542 seconds
30 million	18	4398	886.845 seconds
40 million	24	5432	1112.912 seconds
50 million	30	10747	2079.262 seconds
60 million	36	8158	1754.935 seconds
70 million	42	20634	4325.141 seconds
80 million	48	14356	3236.194 seconds
90 million	54	11480	2785.996 seconds
100 million	60	7679	2169.528 seconds

What do you think on this chart? I'm quite satisfied considering that the job execution time contains the data partitioning and loading time (100 ~ 500 seconds) and there is still much to be desired. This surely shows scalable performance, the SSSP processing time will not increase linearly with the number of vertices.

Edward J. Yoon's Blog

Install a Hama cluster using Whirr

SSSP (Single Source Shortest Path) problem with Apache Hama

음성 인공지능 스타트업의 기회 분석

Report Abuse

Labels