July 26, 2012

Some Benchmarks of Hadoop and Hama on Oracle's BDA

The I/O performance of HDFS with TestDFSIO.
% hadoop jar hadoop-test-0.20.2-cdh3u3b.jar TestDFSIO -write -nrFiles 10 -fileSize 1000

----- TestDFSIO ----- : write
           Date & time: Thu Jul 26 18:50:11 PDT 2012
       Number of files: 10
Total MBytes processed: 10000.0
     Throughput mb/sec: 163.4360801490537
Average IO rate mb/sec: 167.77435302734375
 IO rate std deviation: 25.658150459575825
    Test exec time sec: 19.329

hadoop jar hadoop-test-0.20.2-cdh3u3b.jar TestDFSIO -read -nrFiles 10 -fileSize 1000

----- TestDFSIO ----- : read
           Date & time: Thu Jul 26 19:22:14 PDT 2012
       Number of files: 10
Total MBytes processed: 10000.0
     Throughput mb/sec: 374.6721618583739
Average IO rate mb/sec: 375.14581298828125
 IO rate std deviation: 13.625353109608241
    Test exec time sec: 17.311
The communication performance of Apache Hama with Bench tool.
% hama jar hama-examples-0.5.0.jar bench 16 100000 32
...

12/07/26 21:12:16 INFO bsp.BSPJobClient: Current supersteps number: 30
12/07/26 21:12:19 INFO bsp.BSPJobClient: Current supersteps number: 31
12/07/26 21:12:22 INFO bsp.BSPJobClient: Current supersteps number: 32
12/07/26 21:12:25 INFO bsp.BSPJobClient: The total number of supersteps: 32
12/07/26 21:12:25 INFO bsp.BSPJobClient: Counters: 8
12/07/26 21:12:25 INFO bsp.BSPJobClient:   org.apache.hama.bsp.JobInProgress$JobCounter
12/07/26 21:12:25 INFO bsp.BSPJobClient:     LAUNCHED_TASKS=162
12/07/26 21:12:25 INFO bsp.BSPJobClient:   org.apache.hama.bsp.BSPPeerImpl$PeerCounter
12/07/26 21:12:25 INFO bsp.BSPJobClient:     SUPERSTEPS=32
12/07/26 21:12:25 INFO bsp.BSPJobClient:     SUPERSTEP_SUM=5184
12/07/26 21:12:25 INFO bsp.BSPJobClient:     MESSAGE_BYTES_TRANSFERED=10404951552
12/07/26 21:12:25 INFO bsp.BSPJobClient:     TIME_IN_SYNC_MS=10300386
12/07/26 21:12:25 INFO bsp.BSPJobClient:     TOTAL_MESSAGES_SENT=1036800000
12/07/26 21:12:25 INFO bsp.BSPJobClient:     TOTAL_MESSAGES_RECEIVED=518400000
12/07/26 21:12:25 INFO bsp.BSPJobClient:     MESSAGE_BYTES_RECEIVED=10404951552
Job Finished in 93.33 seconds

July 5, 2012

Running Hama on Oracle's Big Data appliance

This post describes how to setup a Hama cluster on Oracle's Big Data appliance. Apache Hama is a "Bulk Synchronous Parallel" computing framework on top of Hadoop's HDFS.


Basically, Cloudera Manager is installed on Oracle Big Data Appliance to help you with Cloudera's Distribution including Apache Hadoop (CDH) operations. Once finished Hadoop installation, you can check the version of Hadoop as below:  
[root@bda01 ~]# hadoop version
Hadoop 0.20.2-cdh3u3b
Subversion file:///data/1/tmp/topdir/BUILD/hadoop-0.20.2-cdh3u3b -r 0560e235f226fcd7a0b8a011d4a1b78afad032e0
Compiled by root on Fri Mar 16 07:36:05 PDT 2012
From source with checksum 9257f5bf2f59f5a294e9b69f3f59283b
Now let's download latest Hama 0.5.0. You can download at here.
[root@bda01 ~]# wget https://dist.apache.org/repos/dist/release/hama/0.5.0/hama-0.5.0.tar.gz
[root@bda01 ~]# tar xvfz hama-0.5.0.tar.gz
[root@bda01 ~]# cd hama-0.5.0
Hama 0.5 version is shipped with Hadoop 1.0. So, you have to replace Hadoop and Guava jar files in lib folder.
[root@bda01 hama-0.5.0]# rm -rf ./lib/hadoop-*.jar
[root@bda01 hama-0.5.0]# cp /usr/lib/hadoop/hadoop-core-0.20.2-cdh3u3b.jar ./lib
[root@bda01 hama-0.5.0]# cp /usr/lib/hadoop/hadoop-test-0.20.2-cdh3u3b.jar ./lib
[root@bda01 hama-0.5.0]# cp /usr/lib/hadoop/lib/guava-r09-jarjar.jar ./lib
And then configure your cluster according to the "Distributed Mode" scenario described in Getting Started with Hama.
[root@bda01 hama-0.5.0]# bin/start-bspd.sh
...

[root@bda01 hama-0.5.0]# bin/hama jar hama-examples-0.5.0.jar bench 16 10000 32
12/07/05 18:34:20 INFO bsp.BSPJobClient: Running job: job_201207051757_0004
12/07/05 18:34:23 INFO bsp.BSPJobClient: Current supersteps number: 0
12/07/05 18:34:29 INFO bsp.BSPJobClient: Current supersteps number: 6
12/07/05 18:34:32 INFO bsp.BSPJobClient: Current supersteps number: 22
12/07/05 18:34:35 INFO bsp.BSPJobClient: Current supersteps number: 31
12/07/05 18:34:38 INFO bsp.BSPJobClient: Current supersteps number: 32
12/07/05 18:34:38 INFO bsp.BSPJobClient: The total number of supersteps: 32
12/07/05 18:34:38 DEBUG bsp.Counters: Adding SUPERSTEPS
12/07/05 18:34:38 INFO bsp.BSPJobClient: Counters: 8
12/07/05 18:34:38 INFO bsp.BSPJobClient:   org.apache.hama.bsp.JobInProgress$JobCounter
12/07/05 18:34:38 INFO bsp.BSPJobClient:     LAUNCHED_TASKS=90
12/07/05 18:34:38 INFO bsp.BSPJobClient:   org.apache.hama.bsp.BSPPeerImpl$PeerCounter
12/07/05 18:34:38 INFO bsp.BSPJobClient:     SUPERSTEPS=32
12/07/05 18:34:38 INFO bsp.BSPJobClient:     SUPERSTEP_SUM=2880
12/07/05 18:34:38 INFO bsp.BSPJobClient:     MESSAGE_BYTES_TRANSFERED=587404800
12/07/05 18:34:38 INFO bsp.BSPJobClient:     TIME_IN_SYNC_MS=906604
12/07/05 18:34:38 INFO bsp.BSPJobClient:     TOTAL_MESSAGES_SENT=57600000
12/07/05 18:34:38 INFO bsp.BSPJobClient:     TOTAL_MESSAGES_RECEIVED=28800000
12/07/05 18:34:38 INFO bsp.BSPJobClient:     MESSAGE_BYTES_RECEIVED=587404800
Job Finished in 18.27 seconds