Google's BigTable, Pregel, and Streaming PageRank

As already you might know, Google stores the webpages in BigTable. Considering the time-series dimension, the web graph is stored in 3D space like below:

And then Pregel is believed to be used for calculating PageRank. How do it work? When the new version is added to the "webtable" while crawling web pages periodically, each processor of Pregel scans latest version of anchors in its partition, and updates the graph structure. The several vertices received message from newly-created or updated vertex will be reactivated and begun to recompute PageRank incrementally.

Therefore, the Pregel paper describes as a batch system that processes iterative graph algorithms efficiently but I think the vertex-centric model and its spontaneous reactivation mechanism is especially worth noting and very fit for streaming graph.

Although Graph module (Pregel clone) of Apache Hama supports dynamic graph operations, I never tried yet. I'll post more details if it works fine.


Popular posts from this blog

일본만화 추천 100선

음성 인공지능 스타트업의 기회 분석

공유 모빌리티 회사로 합류

인간 본성의 법칙 (책 리뷰 + 잡담)