June 22, 2009

Pregel, Google's large scale graph computing framework

According to google research blog, They made the Pregel for performing large-scale garph computing and uses it for PageRank calculations, shortest path, ..., etc.

In order to achieve that, we have created scalable infrastructure,
named Pregel, to mine a wide range of graphs. In Pregel, programs
are expressed as a sequence of iterations. In each iteration,
a vertex can, independently of other vertices, receive messages
sent to it in the previous iteration, send messages to other vertices,
modify its own and its outgoing edges' states, and mutate the
graph's topology (experts in parallel processing will recognize that
the Bulk Synchronous Parallel Model inspired Pregel).

Maybe the most important things are the locality of adjacent vertices and the dynamic programming. I talked with Hyunsik, a memeber of Heart project about this, We thought it's other distributed programming model instead of map/reduce, but same the shared-nothing architecture for the better performance. I guess it's not much different from the map/reduce framework.

See also:
- Distributing a database for parallelism

Updated - 26 August, 2009 : See also, Inference anatomy of the Google Pregel

June 15, 2009

Non-negative Matrix Factorization

Non-negative matrix factorization (NMF) is a group of algorithms in multivariate analysis and linear algebra where a matrix, X, is factorized into (usually) two matrices, W and H.

A( M by N ) = W(M by K) X H( K by N )

W,H : Initialize W and H to random positive matrices.
W = ||W*H-A||
H = ||H(T)W(T)-A(T)||

Iterates W*H until convergence.

See also,
- http://en.wikipedia.org/wiki/Non-negative_matrix_factorization, WIKIPEDIA
- Non-Negative Matrix Factorization with sparseness constraints
- Non-Negative Matrix Factorization Techniques and Optimizations
- Document Clustering Based On Non-negative Matrix Factorization

June 9, 2009

Jacobi eigenvalue algorithm

The Jacobi eigenvalue algorithm is a numerical procedure for the calculation of all eigenvalues and eigenvectors of a real symmetric matrix.

Each Jacobi rotation can be done in n steps when the pivot element p is known. However the search for p requires inspection of all N ≈ ½ n² off-diag elements. We can reduce this to n steps too if we introduce an additional index array with the property that mi is the index of the largest element in row i, (i = 1, … , n - 1) of the current S. Then (k, l) must be one of the pairs (i,mi) . Since only columns k and l change, only mk and ml must be updated, which again can be done in n steps. Thus each rotation has O(n) cost and one sweep has O(n³) cost which is equivalent to one matrix multiplication. Additionally the mi must be initialized before the process starts, this can be done in n² steps.
Typically the Jacobi method converges within numerical precision after a small number of sweeps. Note that multiple eigenvalues reduce the number of iterations since NS < N. -- WIKIPEDIA

See also:
- Jacobi eigenvalue algorithm, Wikipedia
- A parallel algorithm for the eigenvalues and eigenvectors of a general complex matrix

Here's my test code.

/**
* Jacobi eigenvalue algorithm
*/
public class Jacobi {
public static void main(String[] args) {
double[][] A = new double[][] { { 4, 3, 2, 1, 0, 0 }, { 3, 4, 3, 2, 0, 0 },
{ 2, 3, 4, 3, 0, 0 }, { 1, 2, 3, 4, 0, 0 }, { 0, 0, 0, 0, 1, 0 },
{ 0, 0, 0, 0, 0, 1 } };

double t, c, s;
int p, q, icount, state, size = 6;
double tol = 1.e-5; // the tolerance level of convergence
int icmax = 100; // the maximum iterations number

int[] colRowOfElMax = new int[size], rowOfElMax = new int[1];
double[][] temp = new double[size][size], D = new double[size][size];
double[][] V, diagD;

double[] maxElColRow = new double[size], maxElRow = new double[1];
double[][] dMinusDiagD = new double[size][size], absDminusDiagD = new double[size][size];
double[][] rot = new double[2][2], rotT = new double[2][2];

// makes V into a unit matrix
V = new double[size][size];
for (int i = 0; i < size; i++) {
for (int j = 0; j < size; j++) {
V[i][j] = 0;
}
V[i][i] = 1.0;
}

D = A; // copies A to D
diagD = diag(D, size);// outputs DiagD=diagonal of D
dMinusDiagD = minus(D, diagD, size); // does D-DiagD
abs(dMinusDiagD, absDminusDiagD, size);// does abs(D-DiagD)
maxMatrix(absDminusDiagD, size, colRowOfElMax, maxElColRow);
maxVector(maxElColRow, size, rowOfElMax, maxElRow);
q = rowOfElMax[0];
p = colRowOfElMax[q];
icount = 0;
state = 1;

// Iterations
while (state == 1 && icount < icmax) {
icount = icount + 1;
if (D[q][q] == D[p][p]) { // check to prevent t from diverging
D[q][q] = D[p][p] + 1.e-10;
}
t = D[p][q] / (D[q][q] - D[p][p]);
c = 1 / Math.sqrt(t * t + 1);
s = c * t;
rot[0][0] = c;
rot[0][1] = s;
rot[1][0] = -s;
rot[1][1] = c;
transpose(rot, rotT, 2);// rotT=transpose(Rot)

for (int i = 0; i < size; i++) {
temp[p][i] = rotT[0][0] * D[p][i] + rotT[0][1] * D[q][i];
temp[q][i] = rotT[1][0] * D[p][i] + rotT[1][1] * D[q][i];
D[p][i] = temp[p][i];
D[q][i] = temp[q][i];
}
for (int i = 0; i < size; i++) {
temp[i][p] = D[i][p] * rot[0][0] + D[i][q] * rot[1][0];
temp[i][q] = D[i][p] * rot[0][1] + D[i][q] * rot[1][1];
D[i][p] = temp[i][p];
D[i][q] = temp[i][q];
}
for (int i = 0; i < size; i++) {
temp[i][p] = V[i][p] * rot[0][0] + V[i][q] * rot[1][0];
temp[i][q] = V[i][p] * rot[0][1] + V[i][q] * rot[1][1];
V[i][p] = temp[i][p];
V[i][q] = temp[i][q];
}

// find the new q, p element array values that need to be changed
diagD = diag(D, size); // outputs diagD=diagonal of D
dMinusDiagD = minus(D, diagD, size); // does D-DiagD
abs(dMinusDiagD, absDminusDiagD, size); // does abs(D-DiagD)
maxMatrix(absDminusDiagD, size, colRowOfElMax, maxElColRow);
maxVector(maxElColRow, size, rowOfElMax, maxElRow);
q = rowOfElMax[0];
p = colRowOfElMax[q];
if (Math.abs(D[p][q]) < tol * Math.sqrt(sumDiagElSq(diagD, size)) / size) {
state = 0;
}
}

// V is the eigen vectors
System.out.println("Jacobi Eigenvalues");
for (int i = 0; i < size; i++) {
for (int j = 0; j < size; j++) {
System.out.print(diagD[i][j] + "\t");
}
System.out.println(" ");
}
}

/**
* finds the diagonal elements of A and puts them into B
*
* @param A
* @param n
* @return
*/
public static double[][] diag(double A[][], int n) {
double[][] B = new double[n][n];
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
B[i][j] = 0;
}
B[i][i] = A[i][i];
}

return B;
}

/**
* C = A - B
*
* @param A
* @param B
* @param n
* @return C
*/
public static double[][] minus(double A[][], double B[][], int n) {
double[][] C = new double[n][n];
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
C[i][j] = A[i][j] - B[i][j];
}
}
return C;
}

/**
* Finds the absolute value of a matrix
*
* @param A
* @param B
* @param n
*/
public static void abs(double A[][], double B[][], int n) {
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
B[i][j] = Math.abs(A[i][j]);
}
}
}

/**
* finds the maximum elements of each column; returns the maximums in Max and
* their array positions in Row
*
* @param A
* @param n
* @param Row
* @param Max
*/
public static void maxMatrix(double A[][], int n, int Row[], double Max[]) {
for (int i = 0; i < n; i++) {
int k = 0;
Max[i] = A[k][i];
Row[i] = k;
for (int j = 0; j < n; j++) {
if (A[j][i] > Max[i]) {
Max[i] = A[j][i];
Row[i] = j;
}
}
k = k + 1;
}
}

/**
* finds the maximum elements of a column of A; returns the maximum of a
* column as Max and its array position as Row
*
* @param A
* @param n
* @param Row
* @param Max
*/
public static void maxVector(double A[], int n, int Row[], double Max[]) {
Max[0] = A[0];
Row[0] = 0;
for (int i = 0; i < n; i++) {
if (A[i] > Max[0]) {
Max[0] = A[i];
Row[0] = i;
}
}
}

/**
* finds the transpose of A and puts it into B
*
* @param A
* @param B
* @param n
*/
public static void transpose(double A[][], double B[][], int n) {
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
B[i][j] = A[j][i];
}
}
}

/**
* finds the sums of the squared of the diagonal elements of A
*
* @param A
* @param n
* @return
*/
public static double sumDiagElSq(double A[][], int n) {
double sum = 0;
for (int i = 0; i < n; i++) {
sum = A[i][i] * A[i][i] + sum;
}
return sum;
}
}

June 7, 2009

NAVER Japan, closed beta service has been started again

The Search engine, NAVER Japan closed beta service, will run from the 15th of this month for the limited number of people.




Naver Japan had actually launched in the Japanese market earlier (2005), but the service didn't make any splashes. Instead, According to wikipedia Japan (http://ja.wikipedia.org/wiki/NaverBot), NAVER japan seems disgraced by their behavior, specially, an bots who impersonated a GoogleBot was ran. NAVER Japan will fail again unless the service delivers a significantly better performance and good image.

See also, Japanese Search Engine Market

P.S. The interview model : Julia Oki.