December 29, 2014

Google's BigTable, Pregel, and Streaming PageRank

As already you might know, Google stores the webpages in BigTable. Considering the time-series dimension, the web graph is stored in 3D space like below:


And then Pregel is believed to be used for calculating PageRank. How do it work? When the new version is added to the "webtable" while crawling web pages periodically, each processor of Pregel scans latest version of anchors in its partition, and updates the graph structure. The several vertices received message from newly-created or updated vertex will be reactivated and begun to recompute PageRank incrementally.

Therefore, the Pregel paper describes as a batch system that processes iterative graph algorithms efficiently but I think the vertex-centric model and its spontaneous reactivation mechanism is especially worth noting and very fit for streaming graph.

Although Graph module (Pregel clone) of Apache Hama supports dynamic graph operations, I never tried yet. I'll post more details if it works fine.

No comments:

Post a Comment