In order to achieve that, we have created scalable infrastructure,
named Pregel, to mine a wide range of graphs. In Pregel, programs
are expressed as a sequence of iterations. In each iteration,
a vertex can, independently of other vertices, receive messages
sent to it in the previous iteration, send messages to other vertices,
modify its own and its outgoing edges' states, and mutate the
graph's topology (experts in parallel processing will recognize that
the Bulk Synchronous Parallel Model inspired Pregel).
Maybe the most important things are the locality of adjacent vertices and the dynamic programming. I talked with Hyunsik, a memeber of Heart project about this, We thought it's other distributed programming model instead of map/reduce, but same the shared-nothing architecture for the better performance. I guess it's not much different from the map/reduce framework.
- Distributing a database for parallelism
Updated - 26 August, 2009 : See also, Inference anatomy of the Google Pregel