### PageRank Implementation Using the BSP

In this post, I'm showing how to implement the PageRank using BSP.

And P.S. again, The pseudo code is developed, based on

Anyway, let's assume that the web-graph G is stored in row sparse format as below:

The

1) If v

2) If v

3) Weighted moves from each page

Then, the algorithm can be implemented as below:

References : MultiThreaded Graph Library (MTGL)

**P.S. Apache Hama's BSP framework is not ready yet.**And P.S. again, The pseudo code is developed, based on

*Java*multi-threaded programming. As I introduced before (Hama BSP), the BSP programming is very similar to multi-threaded programming (See BSP serialize printing example). So, the BSP brings a familiar programming model to developers for implementing distributed applications. :-)Anyway, let's assume that the web-graph G is stored in row sparse format as below:

Vertex: 1 2 3 4 5 6 Index: 0 1 3 4 6 8 10 IncomingEdgeList: 3 1 3 1 5 6 3 4 4 5 – Vertices V are web pages. – Vertex index[] points to list of incoming edges's vertex. |

**PageRank Algorithm**is as below:1) If v

_{i}links to v_{k}…– User equally likely to follow any link on page. – Probability of moving from v_{i}to v_{k}= 1/out_degree(v_{i}).

2) If v

_{i}has no outlinks…– User equally likely to jump to any state. – Probability = 1 / |V|

3) Weighted moves from each page

– Percentage of moves that follow a link == Ξ±. – Percentage of moves that are jumps to another page == (1- Ξ±).

Then, the algorithm can be implemented as below:

**BSP main program:**BSPPageRank pagerank[] = new BSPPageRank[vertices.length]; // Loop over all vertices for (i = 0; i < vertices.length; i++) { pagerank[i] = new BSPPageRank(i); } for (i = 0; i < vertices.length; i++) { pagerank[i].join(); }

**Parallel part:**public void run() { double total = 0.0; int begin = index[ind]; int end = index[ind + 1]; //System.out.println("incoming edges: " + begin +" ~ "+ end); // Loop over edges pointing to vertex i. for (int j = begin; j < end; j++) { int src = j; double r = rank[src]; double incr = r / (double) degree[src]; total += incr; } accumulate_rank[ind] = total; // Call sync() method at here, if this is run on the BSP cluster }

References : MultiThreaded Graph Library (MTGL)

I find it a very promising approach!

ReplyDeleteCan you estimate when Apache Hama's BSP framework will be ready?

We have a release plan with BSP (as soon as possible, and at least before the coming autumn).

ReplyDeleteGreat news!

ReplyDeleteThank you.

Hi...

ReplyDeleteI would be interessted how the follwoing Google Pregel snippet would be implemented using Hama. Simply extending the BSP class as within some of the BSP examples is imo not enough as BSP just models a graph partition and not a single vertex. Is there a more vertex-specifc class anywhere?

// (vertex/edge/message)-type

class MinDistanceVertex

: public Vertex {

// Receive current messages

void Compute(MessageIterator* msgs)

{

// Start of the algorithm

int _MinDist = IsSource(vertex_id()) ? 0 : INF;

// Find smallest value within messages

for (; !msgs->Done(); msgs->Next())

_MinDist = min(_MinDist, msgs->Value());

if (_MinDist < GetValue())

{

// Set internal value to new minimal distance

*MutableValue() = _MinDist;

// Send new minimal distance to neighboring nodes

SendMessageToAllNeighbors(_MinDist + 1);

}

// Superstep Synchronization

VoteToHalt();

}

}

The exhibited code does not handle the case of a sink

ReplyDelete