Daum, the operator of Korea’s second most popular internet portal, www.daum.net, is about to release a new service built around a detailed photographic map of Korea, which covers virtually the entire country including Seoul and the neighboring Gyeonggi Province areas, the six metropolitan cities and Jeju Island. The Street View service provides more than 4 million 360° views of streets across the nation.
See this page.
Distributed matrix multiplication with HAMA
To mutliply two matrices A and B, We collect the blocks to 'collectionTable' firstly using map/reduce.
Rows are named as c(i, j) with sequential number ((N^2 * i) + ((j * N) + k) to avoid duplicated records. Each row has a two sub matrices of a(i, k) and b(k, j) so that minimized data movement and network cost. Finally, We multiply and sum sequentially.
Don't look down upon this uneducated method. There are a lot of duplicated blocks but, it'll be distributed at the hadoop/hbase level. And, Increased node numbers, there is a linear increase of IO channel. So, We can see an approximately linear increase of speed with increment of node number.
Here is my test result with Hama:
* 4 Intel(R) Xeon(R) CPU 2.33GHz, SATA hard disk, Physical Memory 16,626,844 KB
* Replication factor : 3
* Rests are default
= Matrix-Matrix Multiply of 5,000 by 5,000 dense matrix =
Unfortunately, hbase seems not ready to put large columns yet. So, I tested 5,000 by 5,000.
* 3 node : 602 seconds, 400 blocks
* 6 node : 270 seconds, 400 blocks
If you have more good idea, please share with me!! :)
Rows are named as c(i, j) with sequential number ((N^2 * i) + ((j * N) + k) to avoid duplicated records. Each row has a two sub matrices of a(i, k) and b(k, j) so that minimized data movement and network cost. Finally, We multiply and sum sequentially.
Don't look down upon this uneducated method. There are a lot of duplicated blocks but, it'll be distributed at the hadoop/hbase level. And, Increased node numbers, there is a linear increase of IO channel. So, We can see an approximately linear increase of speed with increment of node number.
Here is my test result with Hama:
* 4 Intel(R) Xeon(R) CPU 2.33GHz, SATA hard disk, Physical Memory 16,626,844 KB
* Replication factor : 3
* Rests are default
= Matrix-Matrix Multiply of 5,000 by 5,000 dense matrix =
Unfortunately, hbase seems not ready to put large columns yet. So, I tested 5,000 by 5,000.
* 3 node : 602 seconds, 400 blocks
* 6 node : 270 seconds, 400 blocks
If you have more good idea, please share with me!! :)
Island caretaker job offer: get paid $150,000 to swim, snorkel
Wish you were here? Tourism Queensland is offering a $150,000, rent-free job on Hamilton Island. Wow!! really cool. See : http://www.tq.com.au
Google: statistical machine translation
Google has been investing significant resources in a multi-year effort to develop its statistical machine translation technology. It seems really interesting to me.
The sense of some word (e.g. singular, plural, synonym, polynym, ..., etc) is not clear, and Idioms usually cannot be translated literally in another language. It means it's important to grasp the meaning from the context. So, "statistical machine translation".
Generally, the alignment methodology used for machine translation. Then, We can simply think about statistical alignment and machine translation for a mapreduce model as describe below :
- Compute p(f/e) by summing the probabilities of all alignments.

You may want to see also : Bayesian spam filtering using Map/Reduce
Anyway, We may see the multi-language/semantic-information search engine soon.
References
* “구글의 장담 ‘언어 장벽 없앤다’”
* 자동번역 검색은 '웹 언어장벽 해체'의 시작
* Wikipedia: Example-based Machine Translation
* Google Translation Center, a New Human Translations Service in the Making
* Google Translation Center: The World’s Largest Translation Memory
* More Languages in Goole Translate
* MS 번역 서비스 '윈도 라이브 번역기'
* 구글-인터넷 바벨탑에 도전한다?
* CAT(Computer Aided Translation) tool 관련
* Statistical Machine Translation Org
The sense of some word (e.g. singular, plural, synonym, polynym, ..., etc) is not clear, and Idioms usually cannot be translated literally in another language. It means it's important to grasp the meaning from the context. So, "statistical machine translation".
Generally, the alignment methodology used for machine translation. Then, We can simply think about statistical alignment and machine translation for a mapreduce model as describe below :
- Compute p(f/e) by summing the probabilities of all alignments.

e: English sentence l : the length of e in words f : French sentence m : the length of f fj : word j in f aj : the position in e that fj is aligned with eaj : the word in e that fj is aligned with p(wf/we) : translation prob. Z : normalization constant |
You may want to see also : Bayesian spam filtering using Map/Reduce
Anyway, We may see the multi-language/semantic-information search engine soon.
References
* “구글의 장담 ‘언어 장벽 없앤다’”
* 자동번역 검색은 '웹 언어장벽 해체'의 시작
* Wikipedia: Example-based Machine Translation
* Google Translation Center, a New Human Translations Service in the Making
* Google Translation Center: The World’s Largest Translation Memory
* More Languages in Goole Translate
* MS 번역 서비스 '윈도 라이브 번역기'
* 구글-인터넷 바벨탑에 도전한다?
* CAT(Computer Aided Translation) tool 관련
* Statistical Machine Translation Org
Naver, give google korea a sporting chance
The naver.com main page has been changed, Their main idea is to provide users with a personalized homepage to check news, e-mail, blog postings, entertainment features and other content all in one place.
I'm not surprised; This is somewhat like google personal home page (weather forcast, Gmail preview, and RSS feeds, several other module) and There are already many sites on the web that allow the user to set up a personal, customized homepage.
But, almost netizen of korea circumstances are altogether different from me. They are trained by "Stay tuned to this web-site for the latest news and informations". I think it was a key to its popularity in korea.
And now, They will :
be re-trained, or leave to somewhere.
Anyway, the netizen of korea is starting their new life in a far more wide world web.
I'm not surprised; This is somewhat like google personal home page (weather forcast, Gmail preview, and RSS feeds, several other module) and There are already many sites on the web that allow the user to set up a personal, customized homepage.
But, almost netizen of korea circumstances are altogether different from me. They are trained by "Stay tuned to this web-site for the latest news and informations". I think it was a key to its popularity in korea.
And now, They will :
be re-trained, or leave to somewhere.
Anyway, the netizen of korea is starting their new life in a far more wide world web.
Subscribe to:
Posts (Atom)
-
음성 인공지능 분야에서 스타트업이 생각해볼 수 있는 전략은 아마 다음과 같이 3가지 정도가 있을 것이다: 독자적 Vertical 음성 인공지능 Application 구축 기 음성 플랫폼을 활용한 B2B2C 형태의 비지니스 구축 기 음성 플랫폼...
-
네이버, KT, 오라클, 그리고 잠깐의 사업을 거쳐 삼성전자에 입사한지도 2년이 지났습니다. 2016년 병신년을 뒤로하며 이번에는 꽤 색다른 도전에 나섭니다. 무슨 일이야!? 국내 O2O 숙박전문 회사 CTO로 조인합니다! 존! 나 고...
-
우리는 남들의 비판을 경험하면서 창조적 사고를 포기하게 된다. 비판으로부터 방어논리와 자기 검열에 취중한 나머지 더 이상 사고에 자유롭지 못하게 되니까 그렇다. 남들의 비판을 두려워하지 않는 자세.. 그것이 순수한 창조적 사고를 지속하는 방법이다...
-
“군자는 어울리되 패거리를 짓지 않고, 소인은 패거리를 짓되 어울리지 않는다." 군자는 의(義)를 높이기에 아부하지 않고, 부화뇌동(附和雷同)하지 않는다. 군자는 대의명분을 지키면서 화합하며 협력한다. 하지만 소인은 이익을 높이기에 이해관...