July 20, 2009

Hamburg, a graph computing framework on Hadoop

As mentioned ago, I've been forming up the Hamburg project with Hyunsik Choi. Let's see more detail in the diagram of computing method of Hamburg based on BSP model.




Each worker will process the data fragments stored locally. And then, We can do bulk synchronization using collected communication data. The 'Computation' and 'Bulk synchronization' can be performed iteratively, Data for synchronization can be compressed to reduce network usage.

Plainly, It aims to improve the performance of traverse operations in Graph computing. For example, to explores all the neighboring nodes from the root node using Map/Reduce (FYI, Breadth-First Search (BFS) & MapReduce), We need a lot of iterations to get next vertex per-hop time.

If (same condition as before) do BFS using Hamburg, It will cause a lowering the cost of iterations. Let's assume the graph looks like presented below:




The graph was stored in Hbase on distributed system as above. The root is 1. Then, we need only one 'Bulk synchronization' between server2 and server3 with Hamburg. Rests will be calculated on local machine.




Almost graph algorithms are similar with this case.

Updated - 26 August, 2009 : See also, Inference anatomy of the Google Pregel

2 comments:

  1. Server2 needs to tell Server3 that the vertex 6 distance is 2 so that Server3 can assign distance 3 for vertex 7.

    Since Server2 can only see partial view for vertex 2 neighbors, I think Server2 also has to tell Server1 about the distance of vertex 2 (which is 2). Then Server1 decides the final distance of vertex 2 (which is 1).

    Or the other way around: Server1 already decides the distance of vertex 2 is 1 and needs to tell Server2 about it (since from Server2 point of view, the disntance of vertex 2 is 2).

    Eitherway, Server1 and Server2 has to communicate, isn't it?

    ReplyDelete
  2. The start node assumed as a vertext 1. Hence, Server1's processing will be locally done w/o any communicate. There is one communication between Server2 and Server3.

    ReplyDelete