June 20, 2010

Critical bug on Google Apps??

OMG, my buddy bought some domain, there were existing mail boxes on Google Apps.

Do you use the Google Apps? Then, be careful when expiring your domain. Because, new owner of domain can access your old data including mail box of users, sites, ..., etc.

June 14, 2010

Summary of the Google Pregel

The paper of Google Pregel has been published. Here's my summary of the Pregel:

  • Pregel is a scalable and fault-tolerant platform with an API that is sufficiently flexible to express arbitrary graph algorithms.
  • Map/Reduce is one of distributed computing infrastructure, and Pregel is another one.
    • Why did they make Pregel!? 
      • Building a custom distributed infrastructure typically requires a substantial implementation effort which must be repeated for each new algorithm or a graph representation.
      • M/R framework isn't ideal for graph algorithms because it does not support communications among nodes.
      • There is no such system for large scale graph computing.
  • It's inspired by BSP (Bulk Synchronouse Parallel).
  • User-defined function compute() is as below:

void Compute(MessageIterator* msgs) {  // Receive current messages
  int mindist = IsSource(vertex_id()) ? 0 : INF;
  for (; !msgs->;Done(); msgs->;Next())
    mindist = min(mindist, msgs->;Value());

  if (mindist < GetValue()) {
    *MutableValue() = mindist;
    OutEdgeIterator iter = GetOutEdgeIterator();
    for (; !iter.Done(); iter.Next())
      SendMessageTo(iter.Target(), mindist + iter.GetValue()); // Send data to neighbor node
  VoteToHalt(); //  Superstep synchronization

  • Pregel system also uses the master/worker (slave) model.
    • A master maintains worker, recovers faults of workers, and provides Web-UI monitoring tool of job progress.
    • A worker processes its task and communicates with the other workers.
  • Used for Shortest Path, PageRank, ..., etc.

And, I was interested in this phrase:
"Assigning vertices to machines to minimize inter-machine communication is a challenge. Partitioning of the input graph based on topology may suffice if the topology corresponds to the message traffic, but it may not."

June 8, 2010

A distributed caching mechanism to avoid Twitter's API request limit

Recently i made a twitter application which allows to find school friends. Development was simple, but API call limit and Slow speed were problematic. To solve these problems i added a caching layer which gathers&stores API result data from each clients using javascript and server-side scripts, and it is damn fast now!