Edward J. Yoon's opensource life with the
My name is Edward J. Yoon (윤진석), a worker for the NHN, cop. I'm also a founder/developer of the Apache Hama project which is Distributed Matrix Computational Package, and Apache Hadoop.

June 26, 2009

The bulk synchronous parallel computing model

It seems really interesting to me. According to my understanding, the 'Collective Communication' is the core of BSP model. It'll reduces network overhead. Just like that, it's a BULK sync!!

I guess that pregel could solve this BFS problem (http://blog.udanax.org/2009/02/breadth-first-search-mapreduce.html) at once w/o MR iterations.

June 24, 2009

오픈소스 전문 팀 블로그, OpenLamp

오픈소스 전문 팀 블로그, OpenLamp 라는곳에 기고를 시작했습니다. 기자 분들이 시작한 단체인것 같은데, 알고 지내던 상배형(yundream)도 계시니 반갑기 그지없더군요. 그나저나, 한글 포스팅이 드디어 출현했습니다. 영문에도 자신이 없는데 영어만 사용하려고 하다보니, 막상 한글문장 역시 개념이 없어지는것 같아, 제 블로그에도 가끔은 한글 포스팅을 하려고 합니다. ;)

OpenLamp 식구들이 29일 월요일 두번째 모임을 한다고 하네요. 요즘 시간이 너무 없어서 참여를 할 수 있을지는 모르겠지만, 가능하면 꼭 참석해보려고 합니다.

Cafe village in Seo-Jong

Last week, I went to cafe village in Seo-Jong. It was forested areas.

Here's me (with car) and my girl friend.



June 23, 2009

Google Wave, Pregel and Social Network

The Wave. These days, We're sail the ocean of information, called World Wide Web. And the information surges through network cables. Who makes information on the Web?



And the Pregel. Actually, I don't much know about facts of the pregel. But, I noticed that the river called Pregel has a graph shape as below. And I thought, Google Wave could collects the real social networks in humans. Haha, FUN?



Anyway, Wave & Pregel seems blend together.

June 22, 2009

Pregel, Google's large scale graph computing framework

According to google research blog, They made the Pregel for performing large-scale garph computing and uses it for PageRank calculations, shortest path, ..., etc.

In order to achieve that, we have created scalable infrastructure,
named Pregel, to mine a wide range of graphs. In Pregel, programs
are expressed as a sequence of iterations. In each iteration,
a vertex can, independently of other vertices, receive messages
sent to it in the previous iteration, send messages to other vertices,
modify its own and its outgoing edges' states, and mutate the
graph's topology (experts in parallel processing will recognize that
the Bulk Synchronous Parallel Model inspired Pregel).

Maybe the most important things are the locality of adjacent vertices and the dynamic programming. I talked with Hyunsik, a memeber of Heart project about this, We thought it's other distributed programming model instead of map/reduce, but same the shared-nothing architecture for the better performance. I guess it's not much different from the map/reduce framework.

See also:
- Distributing a database for parallelism

June 15, 2009

Non-negative Matrix Factorization

Non-negative matrix factorization (NMF) is a group of algorithms in multivariate analysis and linear algebra where a matrix, X, is factorized into (usually) two matrices, W and H.

A( M by N ) = W(M by K) X H( K by N )

W,H : Initialize W and H to random positive matrices.
W = ||W*H-A||
H = ||H(T)W(T)-A(T)||

Iterates W*H until convergence.

See also,
- http://en.wikipedia.org/wiki/Non-negative_matrix_factorization, WIKIPEDIA
- Non-Negative Matrix Factorization with sparseness constraints
- Non-Negative Matrix Factorization Techniques and Optimizations
- Document Clustering Based On Non-negative Matrix Factorization

June 9, 2009

Jacobi eigenvalue algorithm

The Jacobi eigenvalue algorithm is a numerical procedure for the calculation of all eigenvalues and eigenvectors of a real symmetric matrix.

Each Jacobi rotation can be done in n steps when the pivot element p is known. However the search for p requires inspection of all N ≈ ½ n² off-diag elements. We can reduce this to n steps too if we introduce an additional index array with the property that mi is the index of the largest element in row i, (i = 1, … , n - 1) of the current S. Then (k, l) must be one of the pairs (i,mi) . Since only columns k and l change, only mk and ml must be updated, which again can be done in n steps. Thus each rotation has O(n) cost and one sweep has O(n³) cost which is equivalent to one matrix multiplication. Additionally the mi must be initialized before the process starts, this can be done in n² steps.
Typically the Jacobi method converges within numerical precision after a small number of sweeps. Note that multiple eigenvalues reduce the number of iterations since NS < N. -- WIKIPEDIA

See also:
- Jacobi eigenvalue algorithm, Wikipedia
- A parallel algorithm for the eigenvalues and eigenvectors of a general complex matrix

Here's my test code.

/**
* Jacobi eigenvalue algorithm
*/
public class Jacobi {
public static void main(String[] args) {
double[][] A = new double[][] { { 4, 3, 2, 1, 0, 0 }, { 3, 4, 3, 2, 0, 0 },
{ 2, 3, 4, 3, 0, 0 }, { 1, 2, 3, 4, 0, 0 }, { 0, 0, 0, 0, 1, 0 },
{ 0, 0, 0, 0, 0, 1 } };

double t, c, s;
int p, q, icount, state, size = 6;
double tol = 1.e-5; // the tolerance level of convergence
int icmax = 100; // the maximum iterations number

int[] colRowOfElMax = new int[size], rowOfElMax = new int[1];
double[][] temp = new double[size][size], D = new double[size][size];
double[][] V, diagD;

double[] maxElColRow = new double[size], maxElRow = new double[1];
double[][] dMinusDiagD = new double[size][size], absDminusDiagD = new double[size][size];
double[][] rot = new double[2][2], rotT = new double[2][2];

// makes V into a unit matrix
V = new double[size][size];
for (int i = 0; i < size; i++) {
for (int j = 0; j < size; j++) {
V[i][j] = 0;
}
V[i][i] = 1.0;
}

D = A; // copies A to D
diagD = diag(D, size);// outputs DiagD=diagonal of D
dMinusDiagD = minus(D, diagD, size); // does D-DiagD
abs(dMinusDiagD, absDminusDiagD, size);// does abs(D-DiagD)
maxMatrix(absDminusDiagD, size, colRowOfElMax, maxElColRow);
maxVector(maxElColRow, size, rowOfElMax, maxElRow);
q = rowOfElMax[0];
p = colRowOfElMax[q];
icount = 0;
state = 1;

// Iterations
while (state == 1 && icount < icmax) {
icount = icount + 1;
if (D[q][q] == D[p][p]) { // check to prevent t from diverging
D[q][q] = D[p][p] + 1.e-10;
}
t = D[p][q] / (D[q][q] - D[p][p]);
c = 1 / Math.sqrt(t * t + 1);
s = c * t;
rot[0][0] = c;
rot[0][1] = s;
rot[1][0] = -s;
rot[1][1] = c;
transpose(rot, rotT, 2);// rotT=transpose(Rot)

for (int i = 0; i < size; i++) {
temp[p][i] = rotT[0][0] * D[p][i] + rotT[0][1] * D[q][i];
temp[q][i] = rotT[1][0] * D[p][i] + rotT[1][1] * D[q][i];
D[p][i] = temp[p][i];
D[q][i] = temp[q][i];
}
for (int i = 0; i < size; i++) {
temp[i][p] = D[i][p] * rot[0][0] + D[i][q] * rot[1][0];
temp[i][q] = D[i][p] * rot[0][1] + D[i][q] * rot[1][1];
D[i][p] = temp[i][p];
D[i][q] = temp[i][q];
}
for (int i = 0; i < size; i++) {
temp[i][p] = V[i][p] * rot[0][0] + V[i][q] * rot[1][0];
temp[i][q] = V[i][p] * rot[0][1] + V[i][q] * rot[1][1];
V[i][p] = temp[i][p];
V[i][q] = temp[i][q];
}

// find the new q, p element array values that need to be changed
diagD = diag(D, size); // outputs diagD=diagonal of D
dMinusDiagD = minus(D, diagD, size); // does D-DiagD
abs(dMinusDiagD, absDminusDiagD, size); // does abs(D-DiagD)
maxMatrix(absDminusDiagD, size, colRowOfElMax, maxElColRow);
maxVector(maxElColRow, size, rowOfElMax, maxElRow);
q = rowOfElMax[0];
p = colRowOfElMax[q];
if (Math.abs(D[p][q]) < tol * Math.sqrt(sumDiagElSq(diagD, size)) / size) {
state = 0;
}
}

// V is the eigen vectors
System.out.println("Jacobi Eigenvalues");
for (int i = 0; i < size; i++) {
for (int j = 0; j < size; j++) {
System.out.print(diagD[i][j] + "\t");
}
System.out.println(" ");
}
}

/**
* finds the diagonal elements of A and puts them into B
*
* @param A
* @param n
* @return
*/
public static double[][] diag(double A[][], int n) {
double[][] B = new double[n][n];
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
B[i][j] = 0;
}
B[i][i] = A[i][i];
}

return B;
}

/**
* C = A - B
*
* @param A
* @param B
* @param n
* @return C
*/
public static double[][] minus(double A[][], double B[][], int n) {
double[][] C = new double[n][n];
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
C[i][j] = A[i][j] - B[i][j];
}
}
return C;
}

/**
* Finds the absolute value of a matrix
*
* @param A
* @param B
* @param n
*/
public static void abs(double A[][], double B[][], int n) {
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
B[i][j] = Math.abs(A[i][j]);
}
}
}

/**
* finds the maximum elements of each column; returns the maximums in Max and
* their array positions in Row
*
* @param A
* @param n
* @param Row
* @param Max
*/
public static void maxMatrix(double A[][], int n, int Row[], double Max[]) {
for (int i = 0; i < n; i++) {
int k = 0;
Max[i] = A[k][i];
Row[i] = k;
for (int j = 0; j < n; j++) {
if (A[j][i] > Max[i]) {
Max[i] = A[j][i];
Row[i] = j;
}
}
k = k + 1;
}
}

/**
* finds the maximum elements of a column of A; returns the maximum of a
* column as Max and its array position as Row
*
* @param A
* @param n
* @param Row
* @param Max
*/
public static void maxVector(double A[], int n, int Row[], double Max[]) {
Max[0] = A[0];
Row[0] = 0;
for (int i = 0; i < n; i++) {
if (A[i] > Max[0]) {
Max[0] = A[i];
Row[0] = i;
}
}
}

/**
* finds the transpose of A and puts it into B
*
* @param A
* @param B
* @param n
*/
public static void transpose(double A[][], double B[][], int n) {
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
B[i][j] = A[j][i];
}
}
}

/**
* finds the sums of the squared of the diagonal elements of A
*
* @param A
* @param n
* @return
*/
public static double sumDiagElSq(double A[][], int n) {
double sum = 0;
for (int i = 0; i < n; i++) {
sum = A[i][i] * A[i][i] + sum;
}
return sum;
}
}

June 7, 2009

NAVER Japan, closed beta service has been started again

The Search engine, NAVER Japan closed beta service, will run from the 15th of this month for the limited number of people.



Naver Japan had actually launched in the Japanese market earlier (2005), but the service didn't make any splashes. Instead, According to wikipedia Japan (http://ja.wikipedia.org/wiki/NaverBot), NAVER japan seems disgraced by their behavior, specially, an bots who impersonated a GoogleBot was ran. NAVER Japan will fail again unless the service delivers a significantly better performance and good image.

See also, Japanese Search Engine Market

P.S. The interview model : Julia Oki.

May 27, 2009

IMAP protocol and server

Recently, I'm involved in IMAP server project for my company mail service. I realized that IMAP is a really complex protocol with problems like support for concurrent command processing, aside from the extensive list of commands. So, command parser, session manager and request handler are necessary. For example,

LSUB "#news." "comp.mail.*"
A654 FETCH 2:4 (FLAGS BODY[HEADER.FIELDS (DATE FROM)])

Fortunately , There are some open source projects related with IMAP.

- Apache James, fully featured mail server
- Green Mail, mail server, supports SMTP, POP3, IMAP with SSL
- JIMAP, java based implementation of the IMAP protocol

According to Robert Burrell Donkin which is a member of ASF, "As far as James IMAP goes, the protocol handlers are fine. The Mailbox implementations need more work (but that's on my list for this week).".

May 19, 2009

The secrets of fast mental arithmetic

The 'Britain´s Got Talent' of south korea, 'Star King' introduced the expert of mental arithmetic. She did quick mental arithmetic to calculate large integer numbers in only a few seconds. For example, 258,785,412,354 + 125,465,871,235 + 954,125,487,965 - 845,123,658,792 * 542,136,587,212 ... = ?

I was very impressed. According to She, "I did calculate them with an imaginary abacus".



OK, The abacus-based mental calculation is the recipe. Generally, we learned to calculate multi-digit numbers from right to left, one digit at a time. It's a carry/serial calculation. But Her (abacus) algorithm is little different from us. It's the shift-up/parallel calculation.

May 14, 2009

Frobenius norm with Hama

Recently, The Frobenius norm was nicely implemented as below by Samuel, which is the one of Hama committers. The Frobenius norm is submultiplicative and is very useful for linear algebra. This norm is often easier to compute than induced norms.


/** Frobenius Norm Mapper */
public static class MatrixFrobeniusNormMapper extends MapReduceBase implements
MatrixNormMapper {
@Override
public void map(IntWritable key, MapWritable value,
OutputCollector output, Reporter reporter)
throws IOException {
double rowSqrtSum = 0;
for (Map.Entry e : value.entrySet()) {
double cellValue = ((DoubleEntry) e.getValue()).getValue();
rowSqrtSum += (cellValue * cellValue);
}

nValue.set(rowSqrtSum);
output.collect(nKey, nValue);
}
}

/** Frobenius Norm Combiner */
public static class MatrixFrobeniusNormCombiner extends MapReduceBase
implements MatrixNormReducer {
private double sqrtSum = 0;

@Override
public void reduce(IntWritable key, Iterator values,
OutputCollector output, Reporter reporter)
throws IOException {
while (values.hasNext()) {
sqrtSum += values.next().get();
}
// Note: Tricky here. As we known, we collect each row's sum with key(-1).
// the reduce will just iterate through one key (-1)
// so we collect the max sum-value here
nValue.set(sqrtSum);
output.collect(nKey, nValue);
}
}


You can see this performance on this page

Google support RDFa and Microformats

http://ebiquity.umbc.edu/blogger/2009/05/12/google-support-rdfa-and-microformats/

May 13, 2009

Meeting with the CTO of Sun Japan

I was in a meeting with the CTO of Sun Japan today. He visited me and wanted to know the possibility of Hama deployment into scientific/HPC computing area with Hadoop.

Since it's too early stage, there was not much talk/plan. but I felt sure that it'll really valuable if implemented. It was a delight to talk with him.

May 11, 2009

Psychologists Are Better Stockmarket Speculators Than Economists

Below report shows that the stock is not so much a science as a gamble. ;)

----
From: http://www.medicalnewstoday.com/articles/35953.php

Shareholders seem to be swayed by the buying pattern of other shareholders much less than has hitherto been assumed. This at least is the conclusion arrived at by economists of the Bank of England and the universities of Heidelberg and Bonn. Together with the corporate consultants McKinsey they scrutinised the share-buying behaviour of about 6,500 persons in an Internet experiment. They found no signs of ‘herd instinct’ during the experiment - on the contrary, some of the test subjects decided against buying those specific shares which had just been bought by so many other players. Psychologists, particularly, mistrusted those shares which they regarded as overvalued. This strategy benefited them enormously: on average they were markedly more successful in their speculation than physicists or mathematicians - or even economists.

On average the psychologists earned three times as much as economists and physicists in the stock exchange game. ‘They tended to decide against buying shares precisely when a lot of other players had bought them,’ Dr. Andreas Roider of the University of Bonn’s Economics Department explains. Many discussions up to now have assumed the opposite: investors, it was thought, behave like lemmings. They always buy those shares which are most in demand at a particular time, thereby pushing the share prices too high (or too low). Hardly any explanation of the turbulences on the financial markets are without some reference to the marked predisposition to the herd instinct which allegedly investors show. Yet it might also be the case that each investor has decided in favour of buying independently of the behaviour of other investors - for example, because information has become available about a particular share which argues in favour of buying. Whether shareholders really are influenced by the ‘herd instinct’ is therefore hard to determine in practice.

An economic experiment under controlled conditions was meant to help clarify this issue. Together with corporate consultants McKinsey Professor Jörg Oechssler of the University of Heidelberg with his co-authors Dr. Andreas Roider and Dr. Matthias Drehmann of the Bank of England carried out a financial markets game via the Internet. In it just under 6,500 participants were able to deal in different stocks and shares. Prizes amounting to over 11,000 euros ensured that the game was taken seriously.

‘The players were able to decide in favour of two fictitious shares A and B,’ Dr. Roider explains. ‘Only one share turned up a profit at the end of the game, the other being a dud.’ Before making a decision to buy each participant was given a tip by their investment banker, e.g. ‘Share A is a winner.’ However, these tips were only true in two out of three cases - even investment bankers can make mistakes. As soon as a player had decided in favour of a specific share, the share rose in price. ‘The players could now buy their shares consecutively,’ Dr. Roider says. ‘The first one was only able to rely on the investment banker’s tip. The subsequent buyers, by contrast, were also able to see from the share index what shares their predecessors had chosen.

One million flies can’t be wrong

In a situation like this even a ‘bad’ share can soar dramatically in price - simply because a lot of people choose it. Assuming the first two players get the tip to buy share A from their investment banker, even if player 3 then gets a tip to choose share B, that player may decide in favour of A - after all, his predecessors have apparently been tipped off to choose this share. Player 4, taking as a motto ‘one million flies can’t be wrong’, will find it very difficult to buck the trend and choose share B. The result is that the herd instinct sends the value of share A up to dizzying - and ultimately irrational - heights.

In the experiment, however, those involved by no means followed blindly the behaviour of previous investors. - quite the contrary. As a rule the participants mainly allowed themselves to be influenced in their choice of share by their adviser’s tip. In fact, many investors made a conscious decision to buck the trend, thereby contributing to a stabilisation of the share prices. ‘This can of course be perfectly sensible if you think that the share is currently overpriced and that this is partly due to the irrational behaviour of earlier investors,’ says Dr. Roider. The psychologists, particularly, seemed to ascribe share prices to these sorts of ‘psychological’ effects. Their intuition about the possibly irrational behaviour of other investors meant that they made bigger profits. By contrast, players who had studied Physics seemed to rely on the cool rationality of other participants - and thus fared worse.

Dr. Andreas Roider
roider@uni-bonn.de
University of Bonn
http://www.uni-bonn.de

May 10, 2009

The spammer is a potential advertiser?

I have once thought that the spammer is a potential advertiser, fighting with spammers. Basically, they want to advertise on mail infra-. Should we only block them? Is there a way to convert them to advertiser?

May 7, 2009

Be distressed by excessive work

Commonly I make up my mind easily. I'm not on the horns of a dilemma very often. But, I'm not right now. Lately, I've been considering of going back to school (Or other nice company which could provide me some time for my research) since the lack of time for my open source works / researches and my free time distresses me deeply.

By far, workers in South Korea have the longest work hours among OECD countries. The average South Korean works 2,390 hours each year, according to the OECD. This is over 400 hours longer than the next longest-working country and 34% more hours than the average in the United States.

NHN, corp. is good company, but do you know that NHN, corp. is in Korea? ;(

May 3, 2009

Movie: The Happening 2008

Today, I watched this movie and personally I think it's not bad.



The film opens in New York. People start to get confused in Central Park, repeating their words, standing still and sometimes walking backwards. We hear a few screams. A woman reading on a bench takes her silver chopstick-style hair pin out of her hair and stabs herself in the neck with it.

The plant's chemical defenses were the culprit. That insight seems based on 'New Study Sheds Light on Plants’ Nighttime Defense'.

April 21, 2009

Information streaming

Recent trends that to expanding stream of information based on 'stream' and 'social network' have focused primarily on real world environmental and social models. One day, I thought whether Internet service is better to be intelligent or sexy.

Now, I think "Sexy is better".

April 20, 2009

Compute the transpose of matrix using Hama

The transpose of a matrix is another matrix in which the rows and columns have been reversed. It will be used for SVD (Singular Value Decomposition).

+ + + +
| a11 a12 a13 | | a11 a21 a31 |
| a21 a22 a23 | => | a12 a22 a32 |
| a31 a32 a33 | | a13 a23 a33 |
+ + + +

- A map task receives a row n as a key, and vector of each row as its value
- emit (Reversed index, the entry with the given index)
- Reduce task sets the reversed values

The transpose of 5,000 * 5,000 dense matrix took 12 mins using Hadoop/Hama (10 nodes). Why need to store the result? If we store the result, the locality will be provided for next steps, such as multiplication.

April 10, 2009

Naver Opencast

I was pointed out about the changes of naver main page at 'Naver, give google korea a sporting chance'. The service, called 'Opencast', is finally opened to all users.

"Naver Opencast, in a nutshell, is the feature that allows user to customize a certain section of Naver's main homepage by populating it with content feeds ("casts"), which themselves are generated and shared by other users. It's really a two way system. On the "casting" or "publishing" side, you can generate your own feeds or "casts" by hand-picking good content off the open web; On the "consuming" side, you can subscribe to the "casts" that have been published by other users and have it shown on the generic Naver homepage." -- Web 2.0 Asia.

Here's my cast.


I guessed that it probably similar to some social applications, such as RSS feed and google gadget but it's little different in technical aspects. It requires an artificial editing, contrary to expectation.