## Posts

Showing posts from January, 2009

### Daum opens Korean street view service

Daum, the operator of Korea’s second most popular internet portal, www.daum.net, is about to release a new service built around a detailed photographic map of Korea, which covers virtually the entire country including Seoul and the neighboring Gyeonggi Province areas, the six metropolitan cities and Jeju Island. The Street View service provides more than 4 million 360° views of streets across the nation.

### Distributed matrix multiplication with HAMA

To mutliply two matrices A and B, We collect the blocks to 'collectionTable' firstly using map/reduce.

Rows are named as c(i, j) with sequential number ((N^2 * i) + ((j * N) + k) to avoid duplicated records. Each row has a two sub matrices of a(i, k) and b(k, j) so that minimized data movement and network cost. Finally, We multiply and sum sequentially.

Don't look down upon this uneducated method. There are a lot of duplicated blocks but, it'll be distributed at the hadoop/hbase level. And, Increased node numbers, there is a linear increase of IO channel. So, We can see an approximately linear increase of speed with increment of node number.

Here is my test result with Hama:

* 4 Intel(R) Xeon(R) CPU 2.33GHz, SATA hard disk, Physical Memory 16,626,844 KB
* Replication factor : 3
* Rests are default

= Matrix-Matrix Multiply of 5,000 by 5,000 dense matrix =

Unfortunately, hbase seems not ready to put large columns yet. So, I tested 5,000 by 5,000.

* 3 node : 602 seconds, 400 …

### Google: statistical machine translation

Google has been investing significant resources in a multi-year effort to develop its statistical machine translation technology. It seems really interesting to me.

The sense of some word (e.g. singular, plural, synonym, polynym, ..., etc) is not clear, and Idioms usually cannot be translated literally in another language. It means it's important to grasp the meaning from the context. So, "statistical machine translation".

Generally, the alignment methodology used for machine translation. Then, We can simply think about statistical alignment and machine translation for a mapreduce model as describe below :

- Compute p(f/e) by summing the probabilities of all alignments.

e: English sentence
l : the length of e in words
f : French sentence
m : the length of f
fj : word j in f
aj : the position in e that fj is aligned with
eaj : the word in e that fj is aligned with
p(wf/we) : translation prob.
Z : normalization constant

You may want to see also : Bayesian spam filtering using Map/Reduc…

### Naver, give google korea a sporting chance

The naver.com main page has been changed, Their main idea is to provide users with a personalized homepage to check news, e-mail, blog postings, entertainment features and other content all in one place.

I'm not surprised; This is somewhat like google personal home page (weather forcast, Gmail preview, and RSS feeds, several other module) and There are already many sites on the web that allow the user to set up a personal, customized homepage.

But, almost netizen of korea circumstances are altogether different from me. They are trained by "Stay tuned to this web-site for the latest news and informations". I think it was a key to its popularity in korea.

And now, They will :
be re-trained, or leave to somewhere.

Anyway, the netizen of korea is starting their new life in a far more wide world web.