Summary of the Google Pregel

The paper of Google Pregel has been published. Here's my summary of the Pregel:

  • Pregel is a scalable and fault-tolerant platform with an API that is sufficiently flexible to express arbitrary graph algorithms.
  • Map/Reduce is one of distributed computing infrastructure, and Pregel is another one.
    • Why did they make Pregel!? 
      • Building a custom distributed infrastructure typically requires a substantial implementation effort which must be repeated for each new algorithm or a graph representation.
      • M/R framework isn't ideal for graph algorithms because it does not support communications among nodes.
      • There is no such system for large scale graph computing.
  • It's inspired by BSP (Bulk Synchronouse Parallel).
  • User-defined function compute() is as below:

void Compute(MessageIterator* msgs) {  // Receive current messages
  int mindist = IsSource(vertex_id()) ? 0 : INF;
  for (; !msgs->;Done(); msgs->;Next())
    mindist = min(mindist, msgs->;Value());

  if (mindist < GetValue()) {
    *MutableValue() = mindist;
    OutEdgeIterator iter = GetOutEdgeIterator();
    for (; !iter.Done(); iter.Next())
      SendMessageTo(iter.Target(), mindist + iter.GetValue()); // Send data to neighbor node
  }
  VoteToHalt(); //  Superstep synchronization
}

  • Pregel system also uses the master/worker (slave) model.
    • A master maintains worker, recovers faults of workers, and provides Web-UI monitoring tool of job progress.
    • A worker processes its task and communicates with the other workers.
  • Used for Shortest Path, PageRank, ..., etc.

And, I was interested in this phrase:
"Assigning vertices to machines to minimize inter-machine communication is a challenge. Partitioning of the input graph based on topology may suffice if the topology corresponds to the message traffic, but it may not."

4 comments:

  1. The authors of Pregel didn't aware of a technique on MapReduce to speedup Graph algorithm up to 3x:

    http://developer.yahoo.com/events/hadoopsummit2010/agenda.html#43

    Link to the paper:

    http://www.umiacs.umd.edu/~jimmylin/publications/Lin_Schatz_MLG2010.pdf

    Basically, the technique is to avoid the Graph to be included in the shuffling phase. By doing this, the improvement is around 3x.

    It's not clear from the Pregel paper what is the performance improvement versus MapReduce based SSSP?

    I'm also interested in the number of supersteps needed to complete the SSSP. If the diameter of the graph is large, the number of supersteps will also be large (for simple BFS). Since Pregel graph algorithms work in supersteps, it would be wiser to show the number of supersteps + runtime for each superstep. IMHO, graphs algorithms on Pregel or MapReduce should be measured in the number of supersteps needed (not the runtime).

    I think it's quite OK to say that the runtime for each superstep is more or less constant. It's because we need to read the whole graph anyway (since they are not stored all in memory). Providing random access to the graph is not an option here. Better to read them all in sequential read manner (and skip some vertices).

    ReplyDelete
  2. HI,

    I'm reading that paper and seeing few design pattens.

    I agree with you that we should measure about the number of iterations. And, as you said, there is still I/O overhead involved in reading and writing materialized data every time, even if avoiding the situation shuffle and sort of reduce phase.

    IMO, BSP will communication only some vertices which can't be solved locally, and I'm sure that the number of iterations will be less or equal to (M/R based) Schimmy approach.

    More I hope we can compare them using Hama BSP soon.

    ReplyDelete
  3. Anonymous4/9/11 13:53

    Hi,

    It has been said "M/R framework isn't ideal for graph algorithms because **it does not support communications among nodes**." but isn't it possible to use m/r to model communication like this:

    - Start with (srcNode, (destNode)*) tuples.
    - On mapping srcNode, emit (destNode, (srcNode, msg)) to send msg from srcNode to destNode.
    - On reducing, destNode receives all emitted (srcNode, msg) tuples.

    Any idea?

    - Ali

    ReplyDelete
  4. Anonymous4/9/11 14:48

    I found answer to my question: http://horicky.blogspot.com/2010/07/graph-processing-in-map-reduce.html

    - Ali

    ReplyDelete