Google's BigTable, Pregel, and Streaming PageRank

As already you might know, Google stores the webpages in BigTable. Considering the time-series dimension, the web graph is stored in 3D space like below:


And then Pregel is believed to be used for calculating PageRank. How do it work? When the new version is added to the "webtable" while crawling web pages periodically, each processor of Pregel scans latest version of anchors in its partition, and updates the graph structure. The several vertices received message from newly-created or updated vertex will be reactivated and begun to recompute PageRank incrementally.

Therefore, the Pregel paper describes as a batch system that processes iterative graph algorithms efficiently but I think the vertex-centric model and its spontaneous reactivation mechanism is especially worth noting and very fit for streaming graph.

Although Graph module (Pregel clone) of Apache Hama supports dynamic graph operations, I never tried yet. I'll post more details if it works fine.

No comments:

Post a Comment

무한의 세계

무한 집합의 크기 Cardinality , 즉 원소의 개수를 수학에서는 '농도'라고 말한다. 유한 집합의 크기는 그대로 원소의 개수 이지만, 무한 집합의 경우는 원소의 개수를 낱낱이 셈하는 것은 불가능하기 때문에 '농도'라...