## January 18, 2009

### Daum opens Korean street view service

Daum, the operator of Korea’s second most popular internet portal, www.daum.net, is about to release a new service built around a detailed photographic map of Korea, which covers virtually the entire country including Seoul and the neighboring Gyeonggi Province areas, the six metropolitan cities and Jeju Island. The Street View service provides more than 4 million 360° views of streets across the nation.

## January 15, 2009

### Distributed matrix multiplication with HAMA

To mutliply two matrices A and B, We collect the blocks to 'collectionTable' firstly using map/reduce.

Rows are named as c(i, j) with sequential number ((N^2 * i) + ((j * N) + k) to avoid duplicated records. Each row has a two sub matrices of a(i, k) and b(k, j) so that minimized data movement and network cost. Finally, We multiply and sum sequentially.

Don't look down upon this uneducated method. There are a lot of duplicated blocks but, it'll be distributed at the hadoop/hbase level. And, Increased node numbers, there is a linear increase of IO channel. So, We can see an approximately linear increase of speed with increment of node number.

Here is my test result with Hama:

* 4 Intel(R) Xeon(R) CPU 2.33GHz, SATA hard disk, Physical Memory 16,626,844 KB
* Replication factor : 3
* Rests are default

= Matrix-Matrix Multiply of 5,000 by 5,000 dense matrix =

Unfortunately, hbase seems not ready to put large columns yet. So, I tested 5,000 by 5,000.

* 3 node : 602 seconds, 400 blocks
* 6 node : 270 seconds, 400 blocks

If you have more good idea, please share with me!! :)

## January 6, 2009

### Google: statistical machine translation

Google has been investing significant resources in a multi-year effort to develop its statistical machine translation technology. It seems really interesting to me.

The sense of some word (e.g. singular, plural, synonym, polynym, ..., etc) is not clear, and Idioms usually cannot be translated literally in another language. It means it's important to grasp the meaning from the context. So, "statistical machine translation".

Generally, the alignment methodology used for machine translation. Then, We can simply think about statistical alignment and machine translation for a mapreduce model as describe below :

- Compute p(f/e) by summing the probabilities of all alignments.

 e: English sentencel : the length of e in wordsf : French sentencem : the length of ffj : word j in faj : the position in e that fj is aligned witheaj : the word in e that fj is aligned withp(wf/we) : translation prob.Z : normalization constant

You may want to see also : Bayesian spam filtering using Map/Reduce
Anyway, We may see the multi-language/semantic-information search engine soon.

References

* “구글의 장담 ‘언어 장벽 없앤다’”
* 자동번역 검색은 '웹 언어장벽 해체'의 시작
* Wikipedia: Example-based Machine Translation
* Google Translation Center, a New Human Translations Service in the Making
* Google Translation Center: The World’s Largest Translation Memory
* More Languages in Goole Translate
* MS 번역 서비스 '윈도 라이브 번역기'
* 구글-인터넷 바벨탑에 도전한다?
* CAT(Computer Aided Translation) tool 관련
* Statistical Machine Translation Org

## January 1, 2009

### Naver, give google korea a sporting chance

The naver.com main page has been changed, Their main idea is to provide users with a personalized homepage to check news, e-mail, blog postings, entertainment features and other content all in one place.

I'm not surprised; This is somewhat like google personal home page (weather forcast, Gmail preview, and RSS feeds, several other module) and There are already many sites on the web that allow the user to set up a personal, customized homepage.

But, almost netizen of korea circumstances are altogether different from me. They are trained by "Stay tuned to this web-site for the latest news and informations". I think it was a key to its popularity in korea.

And now, They will :
be re-trained, or leave to somewhere.

Anyway, the netizen of korea is starting their new life in a far more wide world web.