May 12, 2010

PageRank Implementation Using the BSP

In this post, I'm showing how to implement the PageRank using BSP.
P.S. Apache Hama's BSP framework is not ready yet.

And P.S. again, The pseudo code is developed, based on Java multi-threaded programming. As I introduced before (Hama BSP), the BSP programming is very similar to multi-threaded programming (See BSP serialize printing example). So, the BSP brings a familiar programming model to developers for implementing distributed applications. :-)

Anyway, let's assume that the web-graph G is stored in row sparse format as below:

Vertex: 1 2 3 4 5 6
Index: 0 1 3 4 6 8 10
IncomingEdgeList: 3 1 3 1 5 6 3 4 4 5

– Vertices V are web pages.
– Vertex index[] points to list of incoming edges's vertex.
 
The PageRank Algorithm is as below:

1) If vi links to vk

– User equally likely to follow any link on page.
– Probability of moving from vi to vk = 1/out_degree(vi).

2) If vi has no outlinks…

– User equally likely to jump to any state.
– Probability = 1 / |V|

3) Weighted moves from each page

– Percentage of moves that follow a link == α.
– Percentage of moves that are jumps to another page == (1- α).

Then, the algorithm can be implemented as below:

BSP main program:

BSPPageRank pagerank[] = new BSPPageRank[vertices.length];
    // Loop over all vertices
    for (i = 0; i < vertices.length; i++) {
      pagerank[i] = new BSPPageRank(i); 
    }

    for (i = 0; i < vertices.length; i++) {
      pagerank[i].join();
    }

Parallel part:

public void run() {
    double total = 0.0;
    int begin = index[ind];
    int end = index[ind + 1];
    //System.out.println("incoming edges: " + begin +" ~ "+ end);
    
    // Loop over edges pointing to vertex i.
    for (int j = begin; j < end; j++) {
      int src = j;
      double r = rank[src];
      double incr = r / (double) degree[src];
      total += incr;
    }
    accumulate_rank[ind] = total;

    // Call sync() method at here, if this is run on the BSP cluster
  }

References : MultiThreaded Graph Library (MTGL)

5 comments:

  1. I find it a very promising approach!
    Can you estimate when Apache Hama's BSP framework will be ready?

    ReplyDelete
  2. We have a release plan with BSP (as soon as possible, and at least before the coming autumn).

    ReplyDelete
  3. Great news!
    Thank you.

    ReplyDelete
  4. Hi...

    I would be interessted how the follwoing Google Pregel snippet would be implemented using Hama. Simply extending the BSP class as within some of the BSP examples is imo not enough as BSP just models a graph partition and not a single vertex. Is there a more vertex-specifc class anywhere?


    // (vertex/edge/message)-type
    class MinDistanceVertex
    : public Vertex {

    // Receive current messages
    void Compute(MessageIterator* msgs)
    {

    // Start of the algorithm
    int _MinDist = IsSource(vertex_id()) ? 0 : INF;

    // Find smallest value within messages
    for (; !msgs->Done(); msgs->Next())
    _MinDist = min(_MinDist, msgs->Value());

    if (_MinDist < GetValue())
    {

    // Set internal value to new minimal distance
    *MutableValue() = _MinDist;

    // Send new minimal distance to neighboring nodes
    SendMessageToAllNeighbors(_MinDist + 1);

    }

    // Superstep Synchronization
    VoteToHalt();

    }

    }

    ReplyDelete
  5. The exhibited code does not handle the case of a sink

    ReplyDelete