Showing posts from June, 2009

Pregel, Google's large scale graph computing framework

According to google research blog, They made the Pregel for performing large-scale garph computing and uses it for PageRank calculations, shortest path, ..., etc.

In order to achieve that, we have created scalable infrastructure,
named Pregel, to mine a wide range of graphs. In Pregel, programs
are expressed as a sequence of iterations. In each iteration,
a vertex can, independently of other vertices, receive messages
sent to it in the previous iteration, send messages to other vertices,
modify its own and its outgoing edges' states, and mutate the
graph's topology (experts in parallel processing will recognize that
the Bulk Synchronous Parallel Model inspired Pregel).

Maybe the most important things are the locality of adjacent vertices and the dynamic programming. I talked with Hyunsik, a memeber of Heart project about this, We thought it's other distributed programming model instead of map/reduce, but same the shared-nothing architecture for the better performance. I gues…

Non-negative Matrix Factorization

Non-negative matrix factorization (NMF) is a group of algorithms in multivariate analysis and linear algebra where a matrix, X, is factorized into (usually) two matrices, W and H.

A( M by N ) = W(M by K) X H( K by N )

W,H : Initialize W and H to random positive matrices.
W = ||W*H-A||
H = ||H(T)W(T)-A(T)||

Iterates W*H until convergence.

See also,
- Non-Negative Matrix Factorization with sparseness constraints
- Non-Negative Matrix Factorization Techniques and Optimizations
- Document Clustering Based On Non-negative Matrix Factorization

Jacobi eigenvalue algorithm

The Jacobi eigenvalue algorithm is a numerical procedure for the calculation of all eigenvalues and eigenvectors of a real symmetric matrix.

Each Jacobi rotation can be done in n steps when the pivot element p is known. However the search for p requires inspection of all N ≈ ½ n² off-diag elements. We can reduce this to n steps too if we introduce an additional index array with the property that mi is the index of the largest element in row i, (i = 1, … , n - 1) of the current S. Then (k, l) must be one of the pairs (i,mi) . Since only columns k and l change, only mk and ml must be updated, which again can be done in n steps. Thus each rotation has O(n) cost and one sweep has O(n³) cost which is equivalent to one matrix multiplication. Additionally the mi must be initialized before the process starts, this can be done in n² steps.
Typically the Jacobi method converges within numerical precision after a small number of sweeps. Note that multiple eigenvalues reduce the number of iterati…

NAVER Japan, closed beta service has been started again

The Search engine, NAVER Japan closed beta service, will run from the 15th of this month for the limited number of people.

Naver Japan had actually launched in the Japanese market earlier (2005), but the service didn't make any splashes. Instead, According to wikipedia Japan (, NAVER japan seems disgraced by their behavior, specially, an bots who impersonated a GoogleBot was ran. NAVER Japan will fail again unless the service delivers a significantly better performance and good image.

See also, Japanese Search Engine Market

P.S. The interview model : Julia Oki.