Showing posts from 2010

Now I'm a user of HBase.

Today, I've changed the database of my toy project, a URL shortener service, from the MySQL to the Apache HBase. It seems run well with my Java real-time web application.

Actually, it's not my first time. I was contributed a HQL (HBase Query Language) long long time ago. If they didn't tackle me, I could made something like Pig or Hive. Haha, anyways.

My HBase cluster size is 5 nodes. There's no big data yet but, it is now used for some twitter clients and a number of web sites and the rows are increasing as almost 30 per second.

Later, I'll use them for my some research work e.g., information-flow analysis, user propensity analysis, web structure mining, trend mining with the Apache Hama. :-)

[Note] Several problems where Apache Hama can be used

Web Graph Structure MiningSocial Network AnalysisInformation Flow On Social Network (finding top-K influential nodes)Evolution Of Social NetworkHigh Level Machine Learning (Bioinformatics, Chemical informatics .., etc).... and many others.

Serialize Printing of "Hello BSP" with Apache Hama

Serialize Printing of "Hello BSP"

Each BSP task of the HAMA cluster, will print the string "Hello BSP" in serial order. This example will help you to understand the concepts of the BSP computing model.

Each task gets its own hostname (hostname:port pair) and a sorted list containing the hostnames of all the other peers.Each task prints the LOG string "Hello BSP" only when its turn comes at intervals of 5 seconds.
BSP implementation of Serialize Printing of "Hello BSP"

public class SerializePrinting { public static class HelloBSP extends BSP { public static final Log LOG = LogFactory.getLog(HelloBSP.class); private Configuration conf; private final static int PRINT_INTERVAL = 5000; @Override public void bsp(BSPPeer bspPeer) throws IOException, KeeperException, InterruptedException { int num = Integer.parseInt(conf.get("bsp.peers.num")); int i = 0; for (String otherPeer : bspPeer.getAllPe…

Hide the Maven Target Directory from Open Resource Shortcut

I use m2eclipse and I am a big Maven fan. Unfortunately, I don't want resources to show up from my target directory when I use the open resource shortcut. So how do we get around this? Simply, right click the target folder, click Properties, then check the derived checkbox and hit the Ok button.

Cheetah vs. Enzo Ferrari

3.65 seconds (0-100km)3.5 seconds, (0-100km)
in off-road conditions.variable intake manifold2 variable nostrilsrear wheel drivefour-leg drive

NoHadoop? NoMapReduce?

Beyond Hadoop: Next-Generation Big Data Architectures, Oct. 23, 2010
After seen this post, I bought the domains and to build a community and share knowledges about these topics.

MapReduce provides easy-to-program but, you can realize that it's only under specific conditions. In short, the many practical problems requires more flexible and versatile computing system. Actually, we (Apache Hama team) was also really skeptical about use of the MapReduce in the area of linear algebra, machine learning, graph algorithms ,.., etc, and we got rid of dependencies of MapReduce and HBase in the end.

We got rid of dependencies of MapReduce and HBase, Jul 15, 2010
The development of technology is always inevitable, irreversible and unavoidable!

Ancient Aliens? It's only a plausible fantasy.

I watched 'Ancient Aliens' today. There are some ideas and feelings that sound plausible but given just a wee bit of thought can be shown to border on the fantasy.

First there is a west-centered thinking. For e.g., bible, westen technologies, ... , etc. Do you know that there are many more than 100 pyramids in Manchu? Moreover, some of them is bigger and 2 thousand years older than egyptian great pyramid. Why we didn't know about them? Only korean knows. Because of the chineses stopped and restricted excavation works, since they are finding korean artifacts from there.

Anyways .. I was trying to say that, we don't know our history enough to talk about. I think, it is only a kind of plausible fantasy from our lack of knowledge, like humans created god in man's image. What do you think?

Quorum algorithm of the Zookeeper

The Apache Zookeeper is a coordination service for distributed applications, like a Google's Chubby. Many projects uses zookeeper, and we (Apache Hama) also uses zookeeper for barrier synchronization of Bulk Synchronous Parallel computing framework.

Today, I surveyed more about paxos and dynamic quorum of Zookeeper project, to renaming the class name of org.apache.hama.zookeeper.QuorumPeer. Because of documentation is not enough, I didn't know what is the meaning of quorum and the term of "quorum" was somewhat odd to me.
But, the "org.apache.hama.zookeeper.QuorumPeer" is proper name!! xD

So, what is the Quorum and why do we need a Quorum?

According to Wikipedia, Quorum is the minimum number of members of a deliberative body necessary to conduct the business of that group. Ordinarily, this is a majority of the people expected to be there, although many bodies may have a lower or higher quorum.

As you know, a Fault-Tolerant mechanism is one of the important f…

Create a Twitter ReTweet Bot

Personally, I needed a Twitter RT bot that can be used to collect various Tweets around a keyword or a hashtag.

Just simply coded using PHP, MySQL and abraham's twitteroauth.

The program flow is:

1) Get the search results,
2) and retweet a tweet if it is not retweeted yet.

It runs as a cronjob.

See the below code:
$connection = new TwitterOAuth(CONSUMER_KEY, CONSUMER_SECRET, $key, $secret); $response = $connection->get('search', array('q'=>'#hashtag OR "some keyword"')); foreach ($response as $status) { for($i=0; $i < count($status); $i++) { $tweetid = $status[$i]; $result = mysql_fetch_array(mysql_query("select tid from tweets where tid = ".$tweetid.";")); if(empty($result['tid'])) { mysql_query("insert into tweets (tid) values (".$tweetid.");"); $connection->post('statuses/retweet/'.$tweetid); } } }

Apache Hama 0.2.0 RC1 is available for download

Apache Hama 0.2.0 RC1 is now available. Try it out and give us feedback. Submit bug reports to our JIRA bug tracker.

Learn about Hama by reading the documentation.
Hama ArchitectureGetting Started with HamaSerialize Printing Of "Hello BSP"BSP based Pi Estimator

Apache Hama: BSP based Pi Estimator

Pi Estimator

The value of PI can be calculated in a number of ways. Consider the following method of estimating PI

Inscribe a circle in a squareRandomly generate points in the squareDetermine the number of points in the square that are also in the circleLet r be the number of points in the circle divided by the number of points in the squarePI ~ 4 r
Serial pseudo code for this procedure as below:
iterations = 10000 circle_count = 0 do j = 1,iterations generate 2 random numbers between 0 and 1 xcoordinate = random1 ycoordinate = random2 if (xcoordinate, ycoordinate) inside circle then circle_count = circle_count + 1 end do PI = 4.0*circle_count/iterations
The BSP implementation for Pi

A distributed strategy in HAMA with BSP programming model, is break the loop into portions that can be executed by the tasks.

Each task executes locally its portion of the loop a number of times.One task acts as master and collects the results through the BSP communication interface.
public class …

Going Wild

I ain't satisfied with my position today. Unfortunately in my career, despite a good start, I lost many things and many people around me.

Now I'll go on my way, burning with passion.

Mathematics and the Arts Are Related

The view of most people is that art and mathematics could not be more different. One is left brain, the other right brain. One is creative, the other analytical.

However, they are very closely connected. The original impetus to projective geometry came from perspective drawing, 19th ideas of autonomy, freedom, and dignity, led the advance into the abstraction.

Below picture is all-in-one of mathematics, art and nature.
Is it fun? :)

FW: How will Hama BSP different from Pregel?

Firstly, why did we use HBase?

Until last year, we were researched the distributed matrix/graph computing package, based on Map/Reduce.

As you know, the Hadoop is consists of HDFS, which is designed for commodity servers as a shared nothing model (also termed as data partitioning model), and a distributed programming model called Map/Reduce. The Map/Reduce is a high-performance parallel data processing engine, to be sure, but it's not good for complex numerical/relational processing requires huge iterations or inter-node communications. So, we used HBase as a shared storage (shared memory model).

Why BSP instead of Map/Reduce and HBase?

However, there were still problems as below:

OS overhead of running shared storage software (HBase)
The limitation of HBase faculty (especially, a size of column qualifier)
Growth of code complexity
Therefore, we started to consider about message-passing model, and decided to adopt the BSP (Bulk Synchronous Parallel) model, inspired by Pregel from G…

BBC's 50 Places to Visit Before You Die

Which Ones Have You Visited? 
1 The Grand Canyon USA
2 Great Barrier Reef Australia
3 Florida USA
4 South Island New Zealand
5 Cape Town South Africa
6 Golden Temple India
7 Las Vegas USA
8 Sydney Australia
9 New York USA
10 Taj Mahal India
11 Canadian Rockies Canada
12 Uluru Australia
13 Chichen Itza Mexico
14 Machu Picchu Peru
15 Niagara Falls Canada / USA
16 Petra Jordan
17 The Pyramids Egypt
18 Venice Italy
19 Maldives Maldives
20 Great Wall China
21 Victoria Falls Zambia / Zimbabwe
22 Hong Kong Hong Kong
23 Yosemite National Park USA
24 Hawaii USA
25 Auckland New Zealand
26 Iguassu Falls Argentina / Brazil
27 Paris France
28 Alaska USA
29 Angkor Wat Cambodia
30 Himalayas Nepal / Tibet
31 Rio de Janeiro Brazil
32 Masai Mara Kenya
33 Galapagos Islands Ecuador
34 Luxor Egypt
35 Rome Italy
36 San Francisco USA
37 Barcelona Spain
38 Dubai Arab Emirates
39 Singapore Singapore
40 La Digue Seychelles
41 Sri Lanka Sri Lanka
42 Bangkok Thailand
43 Barbados Barbados
44 Iceland Iceland
45 Te…

Vinay Deolalikar's P ≠ NP preliminary paper

예전에 어떤 전북대 교수님이 P = NP 라더니 그 연구 과정/결과 모든게 "구라" 였다라고 하더군 (나는 그 내용 잘 모름) ... 아직 검증결과는 안나왔다지만 사실 많은 사람들이 P ≠ NP 라 생각했던것 같다. 나는 구라여도 좋으니 그저 P = NP 신세계를 보여다오 :/

How to Hide the Address Bar in MobileSafari

[MEMO] bttv 설정

/etc/modprobe.d/bttv.modprobe 파일을 아래와 같이 수정

options bttv card=0 audiomux=1,0x0f,0,0,0x0f tuner=9

Critical bug on Google Apps??

OMG, my buddy bought some domain, there were existing mail boxes on Google Apps.

Do you use the Google Apps? Then, be careful when expiring your domain. Because, new owner of domain can access your old data including mail box of users, sites, ..., etc.

Summary of the Google Pregel

The paper of Google Pregel has been published. Here's my summary of the Pregel:

Pregel is a scalable and fault-tolerant platform with an API that is sufficiently flexible to express arbitrary graph algorithms.Map/Reduce is one of distributed computing infrastructure, and Pregel is another one.Why did they make Pregel!? Building a custom distributed infrastructure typically requires a substantial implementation effort which must be repeated for each new algorithm or a graph representation.M/R framework isn't ideal for graph algorithms because it does not support communications among nodes.There is no such system for large scale graph computing.It's inspired by BSP (Bulk Synchronouse Parallel).User-defined function compute() is as below:
void Compute(MessageIterator* msgs) { // Receive current messages int mindist = IsSource(vertex_id()) ? 0 : INF; for (; !msgs->;Done(); msgs->;Next()) mindist = min(mindist, msgs->;Value()); if (mindist < GetValue()) …

A distributed caching mechanism to avoid Twitter's API request limit

Recently i made a twitter application which allows to find school friends. Development was simple, but API call limit and Slow speed were problematic. To solve these problems i added a caching layer which gathers&stores API result data from each clients using javascript and server-side scripts, and it is damn fast now!

PHP로 Short URL API 사용하기

Short URL 서비스의 API를 이용하여 URL 줄이기는 AJAX/JQuery 를 사용하는 예제였는데, 아래와 같이 PHP로 간단하게 사용가능합니다. :)

<?php function getShortURL($longUrl) { $url = ""; $url .= urlencode($longUrl); $data = file_get_contents($url); $json = json_decode($data, true); return "".$json['shorturl']; } echo getShortUrl(""); ?>

Chrome Extension 개발하기 - Getting Started (Hello, World!)

1. Create a folder somewhere on your computer to contain your extension's code.

컴퓨터 어딘가에 extension code개발할 폴더를 하나 만듭니다.

2. Inside your extension's folder, create a text file called manifest.json, and put this in it:

JSON 포맷의 manifest 파일을 하나 생성합니다. 파일인코딩은 반드시 UTF-8 로 해야 한글이 안깨집니다. ㅋ

* 개발자들은 많이 봤겠지만 manifest는 양키들도 일상에서는 잘 안쓰는 말인데, 사전적의미로는 "명백한, 분명한, 일목요연한" 으로 나오나 한국의 뉘앙스로는 "택배에 배송지/연락처 등등이 잘 정리되어 붙은 스티커" 같은 의미정도 되는것 같네요.

내용은 모두 직관적이라 생략합니다.

{ "name": "Hello World Extension", "version": "1.0", "description": "The first extension that I made.", "browser_action": { "default_icon": "icon.png", }, "permissions": [ "" ] }
3. Copy this icon to the same folder:

"default_icon": "icon.png" 부분에 선언된 바와 같이, extension 폴더 내에 icon.png 를 생성해서 넣어줍니다. 이 icon은 Chrome 에 출력될 버튼의 icon입니다.

4. Load the extension…

Ubuntu eclipse + subclipse 에서의 Failed to load JavaHL Library

- Ubuntu Desktop에서 Eclipse를 활용한 개발환경 구성하기(3)


$ sudo apt-get install libsvn-java

PageRank Implementation Using the BSP

In this post, I'm showing how to implement the PageRank using BSP.
P.S. Apache Hama's BSP framework is not ready yet.

And P.S. again, The pseudo code is developed, based on Java multi-threaded programming. As I introduced before (Hama BSP), the BSP programming is very similar to multi-threaded programming (See BSP serialize printing example). So, the BSP brings a familiar programming model to developers for implementing distributed applications. :-)

Anyway, let's assume that the web-graph G is stored in row sparse format as below:

Vertex: 1 2 3 4 5 6
Index: 0 1 3 4 6 8 10
IncomingEdgeList: 3 1 3 1 5 6 3 4 4 5

– Vertices V are web pages.
– Vertex index[] points to list of incoming edges's vertex.
The PageRank Algorithm is as below:

1) If vi links to vk…

– User equally likely to follow any link on page. – Probability of moving from vi to vk = 1/out_degree(vi).
2) If vi has no outlinks…

– User equally likely to jump to any state. – Probability = 1 / |V|
3) Weighted moves …

VirtualBox Kernel Errors on Ubuntu

솔직히 은행일만 아니면 깔고싶지 않은 것들인데.. 어쨌거나 설치하면서 만난 에러:

root@edward-desktop:/home/edward# sudo /etc/init.d/vboxdrv setup WARNING: All config files need .conf: /etc/modprobe.d/bttv.modprobe, it will be ignored in a future release. * Stopping VirtualBox kernel module * done. * Recompiling VirtualBox kernel module * Look at /var/log/vbox-install.log to find out what went wrong root@edward-desktop:/home/edward# cat /var/log/vbox-install.log Attempting to install using DKMS removing old DKMS module vboxdrv version 3.1.6 ------------------------------ Deleting module version: 3.1.6 completely from the DKMS tree. ------------------------------ Done. Creating symlink /var/lib/dkms/vboxdrv/3.1.6/source -> /usr/src/vboxdrv-3.1.6 DKMS: add Completed. Error! Your kernel source for kernel 2.6.3…

How to Install VirtualBox Guest Additions in Fedora 12

First, in the VM menu (not the Guest but the chrome around it) go to Devices > Install Guest Additions. It will mount a new disc image. Then fire up terminal.

$ su # yum install kernel-headers kernel-devel gcc # export KERN_DIR=/usr/src/kernels/ # cd /media/VBOXADDITIONS_3.1.2_56127 # ./
This time the kernel modules should compile. Then restart the system.

Update for 32-bit Guests:

A few possible changes if this doesn’t work for you with a 32-bit guest. (It didn’t for me, so I had to play around/research a bit more.)

# uname -rIf you see the letters PAE, then you’ll need to follow the rest of these steps. If you don’t see PAE, you should be fine.
If so, make sure your kernel is up to date with
# yum update kernel-PAE. After this, restart.
Instead of the kernel-devel package, you’ll need to install kernel-PAE-devel. That makes the second line of the example above:
# yum install kernel-headers kernel-PAE-devel gccIf you’d already …

The error message "An unknown error occured" of WordPress MU

It sometimes occur when the folder permissions or group ownership is not allowing the files to transfer over. However, In my case, php configuration was reason. The allow_url_fopen should be ON in PHP configuration (php.ini).

How to enable PHP JSON on Gentoo server?

URL shortener API를 php에서 사용해보려고 하다가 만난 삽질들인데...

CentOS 에서는
pecl install JSON해주면 되지만, gentoo에서는 뭔가 에러가 잔뜩 나온다. 이때,
USE="json exif" emerge dev-lang/php명령어로 설치해주면 되더군.

<?php $url = "".urlencode(""); echo $url; $response = str_replace(");","", str_replace("null(","",file_get_contents($url))); $json = json_decode($response, true); echo $json['shorturl']; ?>
Plus, file_get_contents() 함수에서 에러나면,

allow_url_fopen should be ON in PHP configuration (php.ini). Probably, you don’t have access to edit this file and it could also be that your hosting company has restricted changing the allow_url_fopen value. In this case, you should refer to them and ask them to set the value of allow_url_fopen to ON.

P.S. 프로그래밍 언어 중 내가 제일 싫어하는 것들: c, c++, php, sh…

Kick Ass (2010), was funny!!

“I can’t read your mind, but I can kick your ass”

Today, I watched movie Kick Ass (2010). It's directed by Matthew Vaughn, who also directed "Wanted". His movie seems always satisfy the deviant desires of salaryman like me. ★★★★★!!!

When I watched a preview of this movie, it was too babyish. But, there are many scenes of brutality, and it was very impressive to me. Especially, the rescue scene of Kick Ass/Big Daddy part of the final act brought tears to my eyes. (-_-;;)

Installing SciPy

Original article: Installing SciPy

If one couldn't guess from my having built this site using Django, I happen to be a big fan of using Python in various stages of software development. My focus on graphics and image processing requires more robust array and matrix datatypes and their associated operations than what Python includes in its standard library. There are two extensions to Python that provide this functionality: numpy, for efficient, native memory arrays and matrices, and scipy for numerical tools such as solvers, optimization, and Fourier transforms.

While the default installs of these modules are significantly faster than any Python-native implementation, they are still quite slow. The code included in with numpy and scipy to perform this computation is not very efficient. Optimized libraries have been written for the methods that numpy and scipy rely on, so the best of both worlds would be the ease-of-use provided by Python and the performance of tuned architecture-s…

Deleting files that have a minus symbol as file name

아래와 같은 (-) 심볼을 갖은 파일이 생겨서 지우고 싶었는데, 도저히 지울수가 없었다.

-rw-r--r-- 1 edwardyoon other 0 Apr 12 01:43 --no-check-certificate -rw-r--r-- 1 edwardyoon other 0 Apr 12 01:46 -S -bash-3.00$ rm -rf **certificate mv: illegal option -- no-check-certificate ..
지우고 싶은데 명령어 option 으로 인식되니까.. 그런데 나도 참 바보다. (-_-;)
아래처럼 지우면 되는 것을.

-bash-3.00$ rm -rf ./-S

A Multi-Threaded Pi Estimator

Hadoop 의 Map/Reduce 로 Pi estimator가 구현되어있는데,
이를 Hama, BSP로 처리하면 M/R과 어떠한 성능/코드복잡도 차이를 보여줄까?
물론 안봐도 비디오로 예측되는 바, 이 예제로는 딱히 큰 매리트가 없을것도 같으나..

어쨌건 연습삼아 아래와 같은 Pi 계산 알고리즘을 multi-threaded 프로그래밍을써서 간단하게 만들어 봤다.

iterations = 10000 circle_count = 0 do j = 1,iterations generate 2 random numbers between 0 and 1 xcoordinate = random1 ycoordinate = random2 if (xcoordinate, ycoordinate) inside circle then circle_count = circle_count + 1 end do PI = 4.0*circle_count/iterations
이를 자바로 구현해보면 아래와 같다.

public class PiEstimator { private double pi = 0.0; private final int numTasks = 10; private int allFinished = 0; private long starttime = 0; class PiEstimatorTask extends Thread { private PiEstimator Parent = null; private static final int iterations = 100000; public PiEstimatorTask(PiEstimator Parent) { this.Parent = Parent; } public void run() { int in = 0, out = 0; for (int i = 0; i < iterations; i++) { double x = 2.0 * Math.ran…

Lucene n-gram 테스트 코드와 간단 해설

Lucene이 예전엔 몇몇 언어들을 위한 단순 명사 추출, 어절 추출, 띄어쓰기 수준의 것들만 제공했기 때문에 한글같은 경우엔 검색 품질이 영 꽝이었다. "한글 검색하려면 어떻게 하나요?" 라는 문장에서 "검색" 만 추출해서 색인하는게 뭔가 별도의 한글 관련 library 가 없으면 불가능했다라는 얘기. 그러나 최근 버전은 (이것도 꽤 오래전 얘기지만) 한글 명사, 조사, 불용어 제거 등등 고급 parsing 까진 아니더라도 n-gram tokenization 방식으로 어느 정도 해결이 가능케 해놨다.

아래의 코드는 3.0.1 을 다운받아서 sample로 구현한것인데 결과를 먼저 보자.
"아버지가방에들어가신다" 를 색인하면 단어 n개의 연쇄를 추출해서 색인하는데, 이 때문에 "아버지"를 검색하면 걸려든다. 물론 단점으로는 "가방"을 검색해도 걸려든다. (-_-;)

Optimizing index... 188 total milliseconds Term: content:가방 Term: content:가신 Term: content:들어 Term: content:방에 Term: content:버지 Term: content:신다 Term: content:아버 Term: content:어가 Term: content:에들 Term: content:지가 Term: seqid:2 Searching for: 가방 1 total matching documents My seq ID: 2
이것도 몇 년만이라 20분정도를 소요했다. (-_-;;)

public void testLucene() { try { File index = new File("index"); Date start = new Date(); IndexWriter writer = new IndexWriter(FSDir…

Problem with Zend Gdata and include path

Today, I spent a lot of time to install the Zend Gdata. I couldn't passed the InstallationChecker.php with below message:

Zend Framework Installation Errors : Exception thrown trying to access Zend/Loader.php using 'use_include_path' = true. Make sure you include Zend Framework in your include_path which currently contains: .:/usr/share/php:/usr/local/src/Zend-Gdata/library
I thought that it's a problem related to configurations and 'include_path', but It was simply permission problem. Solution?

Make sure httpd can read Zend/Loader.php. That's all. :/

Challenges Of Life

Now i'm downloading the BBC Documentaries, titled "Challenges of life". I didn't watched yet but i knew, the life is an endless chain of challenges.

It is common to all life on earth. I also still challenge (just to survive), reading book called Status Anxiety (Alain de Botton). :/

Blogger 모바일 웹 꾸미기

Blogger 이놈들은 아직까지 Mobile Web 버전을 제공하지 않고 있습니다. 하지만 깔끔하게 정리할 수 있는 꽁수가 있지요. :)

바로 mobile feed reader 를 사용하는겁니다. Layout에 Edit HTML툴을 사용해서 header 태그 사이에 아래와 같이 입력합니다:

<script> if(navigator.platform == 'iPhone') { window.location="{$YOUR_FEED_URL}"; } </script>
그러면 모바일에선 아래와 같이 출력됩니다.

Installing php5.2 on CentOS,

For enabling it run the following commands:

# vi /etc/yum.repos.d/utterramblings.repo [utterramblings] name=Jason's Utter Ramblings Repo baseurl=$releasever/$basearch/ enabled=1 gpgcheck=1 gpgkey=
After just update your php

#yum update php

PHP caching

<? $cachefile = 'cache/filename.cache'; $cachetime = 60; // 60 sec // Serve from the cache if it is younger than $cachetime if (file_exists($cachefile) && (time() - $cachetime < filemtime($cachefile))) { include($cachefile); echo "<!-- Cached ".date('jS F Y H:i', filemtime($cachefile))." -->"; exit; } ob_start(); // start the output buffer ?> your HTML / normal php code here. <? $fp = fopen($cachefile, 'w'); // open the cache file for writing fwrite($fp, ob_get_contents()); // save the contents of output buffer to the file fclose($fp); // close the file ob_end_flush(); // Send the output to the browser ?>

Google sitemap generator for phpBB3

Problems of Tomcat Session Clustering

Today, I tryied to session clustering between two tomcat servers, reading cluster-howto document for tomcat-6.0.

My system is as below:
Server1: (HTTPD + tomcat1) Server2: (only tomcat2 w/o HTTPD) Apache HTTPD / \ / \ tomcat1 tomcat2
However, I'm not familiar with web programs, so it was difficult and spent a lot of time. (-_-;;) But, I finally succeed in configuring. There was some problems with firewall settings and route configurations. I'd like to share them with you.

1) firewall settings

Check that multicast port is on your UDP open list and the receiver TCP port is also for both machines open! I added below list to iptables.

-A RH-Firewall-1-INPUT -p udp --dport 45564 -d -j ACCEPT -A RH-Firewall-1-INPUT -p tcp --dport 4000 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 4000 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 45564 -j ACCEPT
2) network interface fo…

Driving on Bundang-Suseo highway

Recently, I'm on vacation. The video was recorded when I went to a mart and bought some computer stuff.

The scene looked like a road of cherry blossoms. The computer was in my trunk but, I couldn't control my speeding instinct. :)

장자의 진인(眞人) 철학 - 목계지덕

어느 왕이 투계를 몹시 좋아하여 뛰어난 싸움닭을 들고 기성자란 당시 최고의 투계 사육사를 찾아가 최고의 투계로 만들어 달라고 부탁했다. 열흘이 지난 뒤 왕이 기성자에게 물었다. “닭이 충분히 싸울 만한가?” 기성자는 이렇게 대답하였다. “아닙니다. 아직 멀었습니다. 닭이 강하긴 하나 교만하여 아직 자신이 최고인 줄 알고 있습니다. 그 교만을 떨치지 않는 한 최고의 투계라 할 수 없습니다.”

열흘 뒤 왕이 또 물었을 때 기성자는 이렇게 대답하였다. “아직 멀었습니다. 교만함은 버렸으나 상대방의 소리와 그림자에도 너무 쉽게 반응합니다. 태산처럼 움직이지 않는 진중함이 있어야 최고라 할 수 있습니다.”

열흘이 지난 뒤 왕이 다시 묻자 그는 “아직 멀었습니다. 조급함은 버렸으나 상대방을 노려보는 눈초리가 너무 공격적입니다. 그 공격적인 눈초리를 버려야 합니다.”

또 열흘이 지난 뒤 왕이 묻자 “이제 된 것 같습니다. 상대방이 소리를 질러도 아무 반응을 보이지 않고 완전히 마음의 평정을 찾았습니다. 나무와 같은 목계(木鷄)가 되었습니다. 닭의 덕이 완전해졌기에 이제 다른 닭들은 그 모습만 봐도 도망갈 것입니다.”

장자는 이야기에서 최고의 투계는 목계(木鷄)라는 것을 말하고자 한다. 그리고 목계가 되려면 세 가지 조건이 있다.

첫째, 자신이 제일이라는 교만함을 버려야 한다. 자신이 최고라고 으스대는 사람이 배워야 한다.
둘째, 남의 소리와 위협에 쉽게 반응하지 않아야 한다. 누가 뭐라고 하면 쉽게 반응하고 화를 내는 사람이 배워야 한다.
셋째, 상대방에 대한 공격적인 눈초리를 버려야 한다. 누구든 싸우고 경쟁하려고 하는 사람이 배워야 한다.

목계는 인간으로 말하면 완전한 자아의 성취와 평정심을 이룬 사람의 모습이라 할 수 있다. 내가 가지고 있는 특별한 광채와 능력을 상대방에게 드러내지 않기에 그 빛은 더욱 빛날 수 있다. 나무로 만든 닭처럼 평정을 유지할 수 있기에 남들이 쉽게 도발하지 못한다. (박재희 글)

엎질러진 물

중국에 공부는 좋아하지만
일은 전혀 않는 여상이라는 남자가 있었단다.

여상에겐 아내가 있었는데 정작 그 여상이란놈은 책만 읽지,
일은 하나도 안해 집은 찢어지게 가난했던거야.

정나미가 떨어진 마누라는 결국 집을 나가고 말았지.
그로부터 얼마 후...,

여상은 왕에게 능력을 인정받아 곧 크게 출세하게 돼.
그때 홀연히 가출했던 마누라가 돌아오지.
다시 인연을 회복하고 싶다면서...,

그런데 여상은 묵묵히 그릇에 물을 퍼 갖고 나와 뜰 앞에다 쏟았어.
그리고, ... '그럼 그 물을 그릇에 다시 돌려놓아 보시오' 라고 했어.
하지만 물은 이미 땅에 스며들어 퍼담을 수 없었지...

그러자 여상은 말했어.
'한번 엎질러진 물은 원래 그릇에 되담을 수 없는 법이오.' 라고.


슈퍼클래스라는 조나단 '재미없는' 책을 조용히 읽다가, 간만에 마음에 드는 문구를 발견한다.

한 인간의 됨됨이를 시험해 보려거든 그에게 권력을 줘 보라 - 링컨

아인슈타인도 ○○○ 때문에 괴로워했다.

(::1915년 아인슈타인이 친지에 보낸 편지서 “박봉 시달리고 동료 스트레스에 괴롭다” 푸념::) “요즘 나는 비인간적인 조건에서 일하고 있다네. 늘 초과근무에 시달리고 있지. 동료 과학자들은 내 이론에 흠집을 내려하거나 나보다 먼저 연구를 완성시키기 위해 경쟁하는 등 밉살스럽게 행동한다네.”
직장경험이 있는 사람은 심히 공감 할 것 이다.
특히 대기업, R&D 분야, 조직 내 비주류에 속하거나 자라나는 새싹들.
초과근무의 직·간접적 강요, 실적 가로채기, 흠집내기.. 어휴~ 넌저리 나지.

내 경우도 보면 가관인게, 유리한 위치를 이용해먹는건 당연하다고 말한 사람도 있었다. ㅋ
물론, 대놓고 속내를 보여준 그 사람은 뻔뻔하게 헛소리하는 것 보다 차라리 나은 사람 중 한명이다.

그러나, 아인슈타인이 살아가던 그 세상 또한 우리에게 펼처진 것과 같았다는 사실에 주목하자.

2010 HPC trends and Hama project

Obviously, the HPC (High Perfomance Computing) and Scientific-Computing market is expected to continuously growing. According to IDC, the current HPC market is around $10 billion, which is 20% of the total server market. The research company has forecast the HPC market to grow to $15.6 billion by 2012.

By the way, In my opinion, currently, the non-IT company (e.g., chemistry, bio-medical, ... , etc) needs these HPC technologies, rather than web service IT company. Because, innovation of web service doesn't always require a high degree of skill or scientific computing. (Of course, there is some demands from the part of graph/network data processing in web service IT company)

For this reason, currently I consider to implement the Hama as a solution aimed at small HPC market.

Global-Scale Web Services 와 기반 Technologies

예전에 Facebook 에서 어마어마한 동접처리를 위해서 erlang을 사용한다는 얘길 들었다. 또, twitter에서 사용한다는 ejabberd 라는것도 있다. 이런건 대략 distributed / decentralized P2P system like 한 messenger server 인데, 이런건 왜 쓰냐고?

팔로우가 많은 사람은 수백에서 수천, 수만인데.. nearly realtime 을 위해서는 이런거 없이 기술적으로는 여럿에게 메시지 전달이 힘들다.

구글 Buzz가 사용한다는 유사품 pubsubhubbub도 있다.

바야흐로, 세계 Major 웹 서비스 회사들은 Global-Scale, 그 어마어마한 데이터 처리와 real-time service 를 위해, Big Data Storage Systems, Fault Tolerant Architectures, High Scalability, NoSQL 등등 기반 웹 기술을 개발하는데에 집중하고 있다.

Interesting project, hama-mrcl (Map/Reduce + CUBLAS)

I just found interesting project

They tried to perform the matrix multiplication using MapReduce and CUBLAS. To avoid I/O bottlenecks during multiplication processing, a blocking/tiling algorithm was used based on M/R and, CUDA BLAS library (CUBLAS) was used for GPU acceleration in local computations. CUBLAS is a BLAS library ported to CUDA, which enables the use of fast computing by GPUs without direct operation of the CUDA drivers.

The interesting report is at this research, Pure java is better/faster when input (a split, or a sub-matrix in distributed system) is small.

So, .. Perhaps it's not fit with distributed system, which is consist of a lot of nodes. But, I roughly guess that the GPU technology could be useful for future BSP concept of Apache Hama.

I'm not BSP expert yet, but I really love this phrase: "the BSGP program always has a significantly lower code complexity" from Bulk–Synchronous GPU Programming.

Talkers vs. Doers

There's two kinds of people in this world when you boil it all down. You've got your talkers and you've got your doers. Most people are just talkers. All they got is talk. But when all is said and done, it's the doers who change this world. And when they do that, they change us. And that's why we never forget them.

So, which one are you? Do you just talk about it? or do you stand up and do something about it? Because believe you me, all the rest of it is just coffeehouse bullshit.

Protocol Buffers and Hadoop at Twitter

Facial Symmetry

Symmetry, especially facial symmetry, is one of a number of aesthetic traits, including averageness and youthfulness, associated with health, physical attractiveness and beauty of a person or non-human animal according to the authors of Facial Attractiveness: Gillian Rhodes, Leslie A. Zebrowitz.[2] It is also hypothesized as a factor in both interpersonal attraction and interpersonal chemistry. [Wikipedia]
Human beings always pursuing beautiful things. What does beauty mean to you? IMO, the beauty is health. I guess, that emotion is for healthy breeding instinct.

A simple example is a facial symmetry.
We all know that facial symmetry is a important factor in human beauty, many researches have shown that lack of facial symmetry on average correlates with lower beauty rankings. Why Human beings feel beauty from symmetry and balance? AFAIK, the asymmetry of face is affected by spinal curvature. It means that they are unhealthy.

I've never seen animals who have asymmetry of face in …

Carl Friedrich Gauss

Johann Carl Friedrich Gauss (pronounced /ˈɡaʊs/; German: Gauß listen (help·info), Latin: Carolus Fridericus Gauss) (30 April 1777 – 23 February 1855) was a German mathematician and scientist who contributed significantly to many fields, including number theory, statistics, analysis, differential geometry, geodesy, geophysics, electrostatics, astronomy and optics. Sometimes known as the Princeps mathematicorum (Latin, "the Prince of Mathematicians" or "the foremost of mathematicians") and "greatest mathematician since antiquity", Gauss had a remarkable influence in many fields of mathematics and science and is ranked as one of history's most influential mathematicians. He referred to mathematics as "the queen of sciences."

Gauss was a child prodigy. There are many anecdotes pertaining to his precocity while a toddler, and he made his first ground-breaking mathematical discoveries while still a teenager. He completed Disquisitiones Arithmeti…

Gerolamo Cardano - 1

The mathematics of games and gambling 이란 책을 보다가
중간 까메오로 출연하신 Cardano (카르다노) 에 대해 다시 관심을 갖게 되었다.

과거엔 그냥 남의 업적을 가로챈 '미치광이 + 타짜' 캐릭터로만 알고 있던 사람인데,
왠지 나와 너무도 비슷한게 마음에 걸려 곰곰히 빠져들은것이다.

우선 그는 굉장히 outspoken and highly critical 했다고 한다.
직설적이고 비관적인 성격. (바로 내가 주변에서 자주 듣는 얘기 -_-)
요약하면 그냥 주변 사람과 어울리지 못하고 세상과 잘 타협하지 못하는 사람이다.
Wikipedia 에 정리된 그의 bio를 좀더 읽어내려가면 도박과 난데없는 점성술, 자살..
슬슬 별게 다 나온다.

이쯤되면 보통, 이 사람 정체가 뭘까? 라는 생각을 갖게 마련.
'사생아', '성장배경' 등을 걸고 넘어지면 의문은 쉽게 풀리지만,
내 경우를 봐서도 그렇고, 그게 꼭 환경적 요인에 지배됬다고는 할 수 없는것 같다.
어쩔수없는 '천성'이지. ㅋ

시리즈물로 가려고 제목을 Gerolamo Cardano - 1 로 했다.
밤이 늦었으니 다음글에서 Cardano의 games of chance, cubic equation 등등 ..
(자서전에 픽션좀 가미해서 그의 인생 드라마도 한번.. )
좀더 연재할 것을 예약하며 .. See u soon.

"내가 지금 바라는 것은 휴식이다."

FW: Apache Hama in academic paper

HAMA: An Efficient Matrix Computation with the MapReduce Framework

Sangwon Seoyz, Edward J. Yoon, Jae-Hong Kimy, Seongwook Jiny, Jin-Soo Kimx and Seungryoul Maengy
y Computer Science Division, Korea Advanced Institute of Science and Technology (KAIST)
z Computer Science Division, Berlin University of Technology (TU Berlin)
x School of Information and Communication, Sungkyunkwan University, South Korea
User Service Development Center, NHN Corp., South Korea
fswseo, jaehong, swjin,,,


Various scientific computations have become so complex, and thus computation tools play an important role. In this paper, we explore the state-of-the-art framework providing high-level matrix computation primitives with MapReduce through the case study approach, and demonstrate these primitives with different computation engines to show the performance and scalability. We believe the opportunity for using MapReduce in scientific compu…

comScore Reports Global Search Market Growth of 46 Percent in 2009

Google Sites Accounts for Two-Thirds of 131 Billion Searches Conducted Worldwide in December while Introduction of Bing Helps Microsoft Post Significant Gains During the Year

Reston, VA, January 22, 2010 – comScore, Inc. (NASDAQ: SCOR), a leader in measuring the digital world, today released a study on growth in the global search market in 2009. The study revealed that the U.S. remains the largest search market worldwide, while Google Sites retains a commanding position in the global search market.
“The global search market continues to grow at an extraordinary rate, with both highly developed and emerging markets contributing to the strong growth worldwide,” said Jack Flanagan, comScore executive vice president. “Search is clearly becoming a more ubiquitous behavior among Internet users that drives navigation not only directly from search engines but also within sites and across networks. If you equate the advancement of search with the ability of humans to cultivate information, then …