Edward J. Yoon's Blog: SSSP (Single Source Shortest Path) problem with Apache Hama

From yesterday I'm testing Apache Hama SSSP (Single Source Shortest Path) example with random graph of ~ 100 million vertices and ~ 1 billion edges as a input on my small cluster. More specifically:

Experimental environments

One rack (16 nodes 256 cores) cluster
Hadoop 0.20.2
Hama TRUNK r1213634.
10G network

Task and data partitioning

Based on hashing of vertextID in graph and size of input data.

SSSP algorithm

Algorithm described in Pregel paper

And here's rough results for you:

Vertices (x10 edges)	Tasks	Supersteps	Job Execution Time
10 million	6	5423	656.393 seconds
20 million	12	2231	449.542 seconds
30 million	18	4398	886.845 seconds
40 million	24	5432	1112.912 seconds
50 million	30	10747	2079.262 seconds
60 million	36	8158	1754.935 seconds
70 million	42	20634	4325.141 seconds
80 million	48	14356	3236.194 seconds
90 million	54	11480	2785.996 seconds
100 million	60	7679	2169.528 seconds

What do you think on this chart? I'm quite satisfied considering that the job execution time contains the data partitioning and loading time (100 ~ 500 seconds) and there is still much to be desired. This surely shows scalable performance, the SSSP processing time will not increase linearly with the number of vertices.

Edward J. Yoon's Blog

SSSP (Single Source Shortest Path) problem with Apache Hama

No comments:

Post a Comment

음성 인공지능 스타트업의 기회 분석

Report Abuse

Labels