May 19, 2008

Hama project has been accepted into the Incubator
I'm happy, Hama project has accepted into the Apache Incubator. However, It has just began.

May 14, 2008

Hama Incubation vote has started.

The domain of Hama(parallel matrix computation) is one of the most mammoth scientific research. Therefore, I will seek advice for Hama project from the KAIST (Department of Mathematical Sciences) and the the various laboratories.

And, Hama Incubation vote has started. It’s my first time proposing apache so I’m a bit nervous. :)

May 8, 2008

Semantic Indexing

The different forms (e.g. singular, plural, synonym, polynym, ..., etc) of term can only be classified as another keyword even if it belongs to the same semantic. So, The use of semantic indexing seems to improve results in the case of below sample. For example(singular/plural): I could hear song singing what i may want to hear again on the radio. So, I tried to search this song using 'picture of you' phrase which is a part of this song lyric.

1) When I used the Naver search engine, I could't find any clue about that song while pages are being turned; That song's title was 'Pictures of you'.

-- 단수형, 복수형, 동의어, 어휘변형 등 형태적으로는 다른 형태를 띄더라도 같은 의미를 나타내는 용어에 대해서, 네이버 검색은 ‘picture’와 복수형인 ‘pictures’, 동의어인 “사진”을 동일한 색인어 ‘picture’로 핸들링하지 않고 있어, ‘picture’로 검색하더라도 ‘pictures’가 들어간 문서들은 검색할 수 없는 것으로 보인다.

2) However, When I used the Google search engine, I could find the singular/plural forms of the 'Picture', synonyms, and other relevant documents at Top 10 list. I just guess this is the effect of the LSI(Latent Semantic Indexing).

-- 반면에 구글은, LSI 기법을 통해 Top 10 리스트안에서 단수형/복수형은 물론 동의어, 다의어에 대해 적절히 분산하여 노출함을 볼 수 있다.

Then, What effect does Latent Semantic Indexing have on Google ranking? There are many references which touch on the subject of the Google's LSI.

Here's few
- How Does Google Use Latent Semantic Indexing (LSI)?
- Google Semantically Related Words & Latent Semantic Indexing Technology

May 6, 2008

Network Cost.

To reduce the number of remote calls and to avoid the associated overhead (Network Cost), what should we do? Maybe we need a High-Performance Parallel Optimizer for vague/medium-sized operations, but lacks experiences. Now i'm waiting the large test cluster using gigabit switch. :)

PSVM: Parallelizing Support Vector Machines on Distributed Computers