December 13, 2011

SSSP (Single Source Shortest Path) problem with Apache Hama

From yesterday I'm testing Apache Hama SSSP (Single Source Shortest Path) example with random graph of ~ 100 million vertices and ~ 1 billion edges as a input on my small cluster. More specifically:
  • Experimental environments
    • One rack (16 nodes 256 cores) cluster 
    • Hadoop 0.20.2
    • Hama TRUNK r1213634.
    • 10G network
  • Task and data partitioning
    • Based on hashing of vertextID in graph and size of input data.
  • SSSP algorithm
    • Algorithm described in Pregel paper
And here's rough results for you:

Vertices (x10 edges)TasksSuperstepsJob Execution Time
10 million65423656.393 seconds
20 million122231449.542 seconds
30 million184398886.845 seconds
40 million2454321112.912 seconds
50 million30107472079.262 seconds
60 million3681581754.935 seconds
70 million42206344325.141 seconds
80 million48143563236.194 seconds
90 million54114802785.996 seconds
100 million6076792169.528 seconds

What do you think on this chart? I'm quite satisfied considering that the job execution time contains the data partitioning and loading time (100 ~ 500 seconds) and there is still much to be desired. This surely shows scalable performance, the SSSP processing time will not increase linearly with the number of vertices.

No comments:

Post a Comment