July 20, 2009

Hamburg, a graph computing framework on Hadoop

As mentioned ago, I've been forming up the Hamburg project with Hyunsik Choi. Let's see more detail in the diagram of computing method of Hamburg based on BSP model.




Each worker will process the data fragments stored locally. And then, We can do bulk synchronization using collected communication data. The 'Computation' and 'Bulk synchronization' can be performed iteratively, Data for synchronization can be compressed to reduce network usage.

Plainly, It aims to improve the performance of traverse operations in Graph computing. For example, to explores all the neighboring nodes from the root node using Map/Reduce (FYI, Breadth-First Search (BFS) & MapReduce), We need a lot of iterations to get next vertex per-hop time.

If (same condition as before) do BFS using Hamburg, It will cause a lowering the cost of iterations. Let's assume the graph looks like presented below:




The graph was stored in Hbase on distributed system as above. The root is 1. Then, we need only one 'Bulk synchronization' between server2 and server3 with Hamburg. Rests will be calculated on local machine.




Almost graph algorithms are similar with this case.

Updated - 26 August, 2009 : See also, Inference anatomy of the Google Pregel

8 comments:

  1. Server2 needs to tell Server3 that the vertex 6 distance is 2 so that Server3 can assign distance 3 for vertex 7.

    Since Server2 can only see partial view for vertex 2 neighbors, I think Server2 also has to tell Server1 about the distance of vertex 2 (which is 2). Then Server1 decides the final distance of vertex 2 (which is 1).

    Or the other way around: Server1 already decides the distance of vertex 2 is 1 and needs to tell Server2 about it (since from Server2 point of view, the disntance of vertex 2 is 2).

    Eitherway, Server1 and Server2 has to communicate, isn't it?

    ReplyDelete
  2. The start node assumed as a vertext 1. Hence, Server1's processing will be locally done w/o any communicate. There is one communication between Server2 and Server3.

    ReplyDelete
  3. Nice info regarind graph computing framework on Hadoop my sincere thanks for sharing please continue to share this post
    Hadoop Training in Chennai

    ReplyDelete
  4. Hello admin, I have read your blog, it was very nice to read & I am getting useful information’s through your blog. Keep update your blog. AWS Training in Bangalore | Big Data Hadoop Training in Bangalore

    ReplyDelete
  5. The share you really give us excitement. Thanks for your sharing. If you feel tired at work or study try to participate in our games to bring the most exciting feeling. Thank you!
    Python Online Training | Learn Python Online

    ReplyDelete