- Experimental environments
- One rack (16 nodes 256 cores) cluster
- Hadoop 0.20.2
- Hama TRUNK r1213634.
- 10G network
- Task and data partitioning
- Based on hashing of vertextID in graph and size of input data.
- SSSP algorithm
- Algorithm described in Pregel paper
Vertices (x10 edges) | Tasks | Supersteps | Job Execution Time |
10 million | 6 | 5423 | 656.393 seconds |
20 million | 12 | 2231 | 449.542 seconds |
30 million | 18 | 4398 | 886.845 seconds |
40 million | 24 | 5432 | 1112.912 seconds |
50 million | 30 | 10747 | 2079.262 seconds |
60 million | 36 | 8158 | 1754.935 seconds |
70 million | 42 | 20634 | 4325.141 seconds |
80 million | 48 | 14356 | 3236.194 seconds |
90 million | 54 | 11480 | 2785.996 seconds |
100 million | 60 | 7679 | 2169.528 seconds |
What do you think on this chart? I'm quite satisfied considering that the job execution time contains the data partitioning and loading time (100 ~ 500 seconds) and there is still much to be desired. This surely shows scalable performance, the SSSP processing time will not increase linearly with the number of vertices.
No comments:
Post a Comment