Now I'm a user of HBase.

Today, I've changed the database of my toy project, a URL shortener service, from the MySQL to the Apache HBase. It seems run well with my Java real-time web application.

Actually, it's not my first time. I was contributed a HQL (HBase Query Language) long long time ago. If they didn't tackle me, I could made something like Pig or Hive. Haha, anyways.

My HBase cluster size is 5 nodes. There's no big data yet but, it is now used for some twitter clients and a number of web sites and the rows are increasing as almost 30 per second.

Later, I'll use them for my some research work e.g., information-flow analysis, user propensity analysis, web structure mining, trend mining with the Apache Hama. :-)

[Note] Several problems where Apache Hama can be used

  • Web Graph Structure Mining
  • Social Network Analysis
    • Information Flow On Social Network (finding top-K influential nodes)
    • Evolution Of Social Network
  • High Level Machine Learning (Bioinformatics, Chemical informatics .., etc)
.... and many others.

Serialize Printing of "Hello BSP" with Apache Hama

Serialize Printing of "Hello BSP"

Each BSP task of the HAMA cluster, will print the string "Hello BSP" in serial order. This example will help you to understand the concepts of the BSP computing model.

  • Each task gets its own hostname (hostname:port pair) and a sorted list containing the hostnames of all the other peers.
  • Each task prints the LOG string "Hello BSP" only when its turn comes at intervals of 5 seconds.

BSP implementation of Serialize Printing of "Hello BSP"

public class SerializePrinting {
  public static class HelloBSP extends BSP {
    public static final Log LOG = LogFactory.getLog(HelloBSP.class);
    private Configuration conf;
    private final static int PRINT_INTERVAL = 5000;

    public void bsp(BSPPeer bspPeer) throws IOException, KeeperException,
        InterruptedException {
      int num = Integer.parseInt(conf.get("bsp.peers.num"));

      int i = 0;
      for (String otherPeer : bspPeer.getAllPeerNames()) {
        if (bspPeer.getPeerName().equals(otherPeer)) {
"Hello BSP from " + i + " of " + num + ": "
              + bspPeer.getPeerName());

    public Configuration getConf() {
      return conf;

    public void setConf(Configuration conf) {
      this.conf = conf;


  public static void main(String[] args) throws InterruptedException,
      IOException {
    // BSP job configuration
    HamaConfiguration conf = new HamaConfiguration();
    // Execute locally
    // conf.set("bsp.master.address", "local");

    BSPJob bsp = new BSPJob(conf, SerializePrinting.class);
    // Set the job name
    bsp.setJobName("serialize printing");


Hide the Maven Target Directory from Open Resource Shortcut

I use m2eclipse and I am a big Maven fan. Unfortunately, I don't want resources to show up from my target directory when I use the open resource shortcut. So how do we get around this? Simply, right click the target folder, click Properties, then check the derived checkbox and hit the Ok button.

Cheetah vs. Enzo Ferrari

3.65 seconds (0-100km)3.5 seconds, (0-100km)
in off-road conditions.
variable intake manifold2 variable nostrils
rear wheel drivefour-leg drive

NoHadoop? NoMapReduce?

After seen this post, I bought the domains and to build a community and share knowledges about these topics.

MapReduce provides easy-to-program but, you can realize that it's only under specific conditions. In short, the many practical problems requires more flexible and versatile computing system. Actually, we (Apache Hama team) was also really skeptical about use of the MapReduce in the area of linear algebra, machine learning, graph algorithms ,.., etc, and we got rid of dependencies of MapReduce and HBase in the end.

The development of technology is always inevitable, irreversible and unavoidable!

Ancient Aliens? It's only a plausible fantasy.

I watched 'Ancient Aliens' today. There are some ideas and feelings that sound plausible but given just a wee bit of thought can be shown to border on the fantasy.

First there is a west-centered thinking. For e.g., bible, westen technologies, ... , etc. Do you know that there are many more than 100 pyramids in Manchu? Moreover, some of them is bigger and 2 thousand years older than egyptian great pyramid. Why we didn't know about them? Only korean knows. Because of the chineses stopped and restricted excavation works, since they are finding korean artifacts from there.

Anyways .. I was trying to say that, we don't know our history enough to talk about. I think, it is only a kind of plausible fantasy from our lack of knowledge, like humans created god in man's image. What do you think?

Quorum algorithm of the Zookeeper

The Apache Zookeeper is a coordination service for distributed applications, like a Google's Chubby. Many projects uses zookeeper, and we (Apache Hama) also uses zookeeper for barrier synchronization of Bulk Synchronous Parallel computing framework.

Today, I surveyed more about paxos and dynamic quorum of Zookeeper project, to renaming the class name of org.apache.hama.zookeeper.QuorumPeer. Because of documentation is not enough, I didn't know what is the meaning of quorum and the term of "quorum" was somewhat odd to me.
But, the "org.apache.hama.zookeeper.QuorumPeer" is proper name!! xD

So, what is the Quorum and why do we need a Quorum?

According to Wikipedia, Quorum is the minimum number of members of a deliberative body necessary to conduct the business of that group. Ordinarily, this is a majority of the people expected to be there, although many bodies may have a lower or higher quorum.

As you know, a Fault-Tolerant mechanism is one of the important functions of distributed system. Quorum algorithm is used to prevent the split-brain condition. When split-brain condition occurrs, according to the Quorum algorithm, zookeeper determines the "Primary Patrition" and "Secondary Partition". Then, the servers in primary group are receives and processes user's request, and the servers in secondary group are becomes read-only mode. When recovered from split-brain condition? They will be merged to one partition again. Interanally, zookeeper uses atomic broadcast protocol instead of Paxos.

Create a Twitter ReTweet Bot

Personally, I needed a Twitter RT bot that can be used to collect various Tweets around a keyword or a hashtag.

Just simply coded using PHP, MySQL and abraham's twitteroauth.

The program flow is:

1) Get the search results,
2) and retweet a tweet if it is not retweeted yet.

It runs as a cronjob.

See the below code:
$connection = new TwitterOAuth(CONSUMER_KEY, CONSUMER_SECRET, $key, $secret);
$response = $connection->get('search', array('q'=>'#hashtag OR "some keyword"'));
foreach ($response as $status) {

  for($i=0; $i < count($status); $i++)
    $tweetid = $status[$i];
    $result = mysql_fetch_array(mysql_query("select tid from tweets where tid = ".$tweetid.";"));
    if(empty($result['tid'])) {
      mysql_query("insert into tweets (tid) values (".$tweetid.");");


Apache Hama: BSP based Pi Estimator

Pi Estimator

The value of PI can be calculated in a number of ways. Consider the following method of estimating PI

  • Inscribe a circle in a square
  • Randomly generate points in the square
  • Determine the number of points in the square that are also in the circle
  • Let r be the number of points in the circle divided by the number of points in the square
  • PI ~ 4 r

Serial pseudo code for this procedure as below:
iterations = 10000
circle_count = 0

do j = 1,iterations
  generate 2 random numbers between 0 and 1
  xcoordinate = random1
  ycoordinate = random2
  if (xcoordinate, ycoordinate) inside circle
  then circle_count = circle_count + 1
end do

PI = 4.0*circle_count/iterations

The BSP implementation for Pi

A distributed strategy in HAMA with BSP programming model, is break the loop into portions that can be executed by the tasks.

  • Each task executes locally its portion of the loop a number of times.
  • One task acts as master and collects the results through the BSP communication interface.

public class PiEstimator {
  private static String MASTER_TASK = "master.task.";

  public static class MyEstimator extends BSP {
    public static final Log LOG = LogFactory.getLog(MyEstimator.class);
    private Configuration conf;
    private String masterTask;
    private static final int iterations = 10000;

    public void bsp(BSPPeer bspPeer) throws IOException, KeeperException,
        InterruptedException {
      int in = 0, out = 0;
      for (int i = 0; i < iterations; i++) {
        double x = 2.0 * Math.random() - 1.0, y = 2.0 * Math.random() - 1.0;
        if ((Math.sqrt(x * x + y * y) < 1.0)) {
        } else {

      byte[] tagName = Bytes.toBytes(getName().toString());
      byte[] myData = Bytes.toBytes(4.0 * (double) in / (double) iterations);
      BSPMessage estimate = new BSPMessage(tagName, myData);

      bspPeer.send(bspPeer.getAddress(masterTask), estimate);

      double pi = 0.0;
      BSPMessage received;
      while ((received = bspPeer.getCurrentMessage()) != null) {"Receives messages:" + Bytes.toDouble(received.getData()));
        if(pi == 0.0) {
          pi = Bytes.toDouble(received.getData());
        } else {
          pi = (pi + Bytes.toDouble(received.getData())) / 2;

      if (pi != 0.0) {"\nEstimated value of PI is " + pi);

    public Configuration getConf() {
      return conf;

    public void setConf(Configuration conf) {
      this.conf = conf;
      this.masterTask = conf.get(MASTER_TASK);


  public static void main(String[] args) throws InterruptedException,
      IOException {
    // BSP job configuration
    HamaConfiguration conf = new HamaConfiguration();
    // Execute locally
    // conf.set("bsp.master.address", "local");

    BSPJob bsp = new BSPJob(conf, PiEstimator.class);
    // Set the job name
    bsp.setJobName("pi estimation example");

    BSPJobClient jobClient = new BSPJobClient(conf);
    ClusterStatus cluster = jobClient.getClusterStatus(true);
    // Choose one as a master
    for (String name : cluster.getActiveGroomNames()) {
      conf.set(MASTER_TASK, name);


Going Wild

I ain't satisfied with my position today. Unfortunately in my career, despite a good start, I lost many things and many people around me.

Now I'll go on my way, burning with passion.

Mathematics and the Arts Are Related

The view of most people is that art and mathematics could not be more different. One is left brain, the other right brain. One is creative, the other analytical.

However, they are very closely connected. The original impetus to projective geometry came from perspective drawing, 19th ideas of autonomy, freedom, and dignity, led the advance into the abstraction.

Below picture is all-in-one of mathematics, art and nature.
Is it fun? :)

FW: How will Hama BSP different from Pregel?

Firstly, why did we use HBase?

Until last year, we were researched the distributed matrix/graph computing package, based on Map/Reduce.

As you know, the Hadoop is consists of HDFS, which is designed for commodity servers as a shared nothing model (also termed as data partitioning model), and a distributed programming model called Map/Reduce. The Map/Reduce is a high-performance parallel data processing engine, to be sure, but it's not good for complex numerical/relational processing requires huge iterations or inter-node communications. So, we used HBase as a shared storage (shared memory model).

Why BSP instead of Map/Reduce and HBase?

However, there were still problems as below:

OS overhead of running shared storage software (HBase)
The limitation of HBase faculty (especially, a size of column qualifier)
Growth of code complexity
Therefore, we started to consider about message-passing model, and decided to adopt the BSP (Bulk Synchronous Parallel) model, inspired by Pregel from Google Research Blog.

What's the Pregel?

According to my understanding, Pregel is graph-specific: a large-scale graph computing framework, based on BSP model.

How will Hama BSP different from Pregel?

Hama BSP is a computing engine, based on BSP model, like a Pregel, and it'll be compatible with existing HDFS cluster, or any FileSystem and Database in the future. However, we believe that the BSP computing model is not limited to a problems of graph; it can be used for widely distributed software such as Map/Reduce. In addition to a field of graph, there are many other algorithms, which have similar problems with graph processing using Map/Reduce. Actually, the BSP model has been researched for many years in the field of matrix computation, too.

Therefore, we're trying to implement more generalized BSP computing solution. And, the Hama will consists of the BSP computing engine, and a set of few examples (e.g., matrix inversion, pagerank, BFS, ..., etc).

You can locally test your BSP program using TRUNK version of Hama project.
Please subscribe the mailing list or comment here if you have any question, suggestion, objection about our project.

BBC's 50 Places to Visit Before You Die

Which Ones Have You Visited? 

1 The Grand Canyon USA
2 Great Barrier Reef Australia
3 Florida USA
4 South Island New Zealand
5 Cape Town South Africa
6 Golden Temple India
7 Las Vegas USA
8 Sydney Australia
9 New York USA
10 Taj Mahal India
11 Canadian Rockies Canada
12 Uluru Australia
13 Chichen Itza Mexico
14 Machu Picchu Peru
15 Niagara Falls Canada / USA
16 Petra Jordan
17 The Pyramids Egypt
18 Venice Italy
19 Maldives Maldives
20 Great Wall China
21 Victoria Falls Zambia / Zimbabwe
22 Hong Kong Hong Kong
23 Yosemite National Park USA
24 Hawaii USA
25 Auckland New Zealand
26 Iguassu Falls Argentina / Brazil
27 Paris France
28 Alaska USA
29 Angkor Wat Cambodia
30 Himalayas Nepal / Tibet
31 Rio de Janeiro Brazil
32 Masai Mara Kenya
33 Galapagos Islands Ecuador
34 Luxor Egypt
35 Rome Italy
36 San Francisco USA
37 Barcelona Spain
38 Dubai Arab Emirates
39 Singapore Singapore
40 La Digue Seychelles
41 Sri Lanka Sri Lanka
42 Bangkok Thailand
43 Barbados Barbados
44 Iceland Iceland
45 Terracotta Army China
46 Zermatt Switzerland
47 Angel Falls Venezuela
48 Abu Simbel Egypt
49 Bali Indonesia
50 French Polynesia French Polynesia 

Vinay Deolalikar's P ≠ NP preliminary paper

예전에 어떤 전북대 교수님이 P = NP 라더니 그 연구 과정/결과 모든게 "구라" 였다라고 하더군 (나는 그 내용 잘 모름) ... 아직 검증결과는 안나왔다지만 사실 많은 사람들이 P ≠ NP 라 생각했던것 같다. 나는 구라여도 좋으니 그저 P = NP 신세계를 보여다오 :/

[MEMO] bttv 설정

/etc/modprobe.d/bttv.modprobe 파일을 아래와 같이 수정

options bttv card=0 audiomux=1,0x0f,0,0,0x0f tuner=9

Critical bug on Google Apps??

OMG, my buddy bought some domain, there were existing mail boxes on Google Apps.

Do you use the Google Apps? Then, be careful when expiring your domain. Because, new owner of domain can access your old data including mail box of users, sites, ..., etc.

Summary of the Google Pregel

The paper of Google Pregel has been published. Here's my summary of the Pregel:

  • Pregel is a scalable and fault-tolerant platform with an API that is sufficiently flexible to express arbitrary graph algorithms.
  • Map/Reduce is one of distributed computing infrastructure, and Pregel is another one.
    • Why did they make Pregel!? 
      • Building a custom distributed infrastructure typically requires a substantial implementation effort which must be repeated for each new algorithm or a graph representation.
      • M/R framework isn't ideal for graph algorithms because it does not support communications among nodes.
      • There is no such system for large scale graph computing.
  • It's inspired by BSP (Bulk Synchronouse Parallel).
  • User-defined function compute() is as below:

void Compute(MessageIterator* msgs) {  // Receive current messages
  int mindist = IsSource(vertex_id()) ? 0 : INF;
  for (; !msgs->;Done(); msgs->;Next())
    mindist = min(mindist, msgs->;Value());

  if (mindist < GetValue()) {
    *MutableValue() = mindist;
    OutEdgeIterator iter = GetOutEdgeIterator();
    for (; !iter.Done(); iter.Next())
      SendMessageTo(iter.Target(), mindist + iter.GetValue()); // Send data to neighbor node
  VoteToHalt(); //  Superstep synchronization

  • Pregel system also uses the master/worker (slave) model.
    • A master maintains worker, recovers faults of workers, and provides Web-UI monitoring tool of job progress.
    • A worker processes its task and communicates with the other workers.
  • Used for Shortest Path, PageRank, ..., etc.

And, I was interested in this phrase:
"Assigning vertices to machines to minimize inter-machine communication is a challenge. Partitioning of the input graph based on topology may suffice if the topology corresponds to the message traffic, but it may not."

A distributed caching mechanism to avoid Twitter's API request limit

Recently i made a twitter application which allows to find school friends. Development was simple, but API call limit and Slow speed were problematic. To solve these problems i added a caching layer which gathers&stores API result data from each clients using javascript and server-side scripts, and it is damn fast now!

PHP로 Short URL API 사용하기

Short URL 서비스의 API를 이용하여 URL 줄이기는 AJAX/JQuery 를 사용하는 예제였는데, 아래와 같이 PHP로 간단하게 사용가능합니다. :)

  function getShortURL($longUrl) {
    $url = "";
    $url .= urlencode($longUrl);
    $data = file_get_contents($url);
    $json = json_decode($data, true);
    return "".$json['shorturl'];

  echo getShortUrl("");

Chrome Extension 개발하기 - Getting Started (Hello, World!)

1. Create a folder somewhere on your computer to contain your extension's code.

컴퓨터 어딘가에 extension code개발할 폴더를 하나 만듭니다.

2. Inside your extension's folder, create a text file called manifest.json, and put this in it:

JSON 포맷의 manifest 파일을 하나 생성합니다. 파일인코딩은 반드시 UTF-8 로 해야 한글이 안깨집니다. ㅋ

* 개발자들은 많이 봤겠지만 manifest는 양키들도 일상에서는 잘 안쓰는 말인데, 사전적의미로는 "명백한, 분명한, 일목요연한" 으로 나오나 한국의 뉘앙스로는 "택배에 배송지/연락처 등등이 잘 정리되어 붙은 스티커" 같은 의미정도 되는것 같네요.

내용은 모두 직관적이라 생략합니다.

  "name": "Hello World Extension",
  "version": "1.0",
  "description": "The first extension that I made.",
  "browser_action": {
    "default_icon": "icon.png",
  "permissions": [

3. Copy this icon to the same folder:

"default_icon": "icon.png" 부분에 선언된 바와 같이, extension 폴더 내에 icon.png 를 생성해서 넣어줍니다. 이 icon은 Chrome 에 출력될 버튼의 icon입니다.

4. Load the extension.

 > 확장 프로그램(E) > 압축 해제된 확장 프로그램 로드 ... 를 클릭해서 extension 폴더를 선택하면 Chrome 우측 상단에 extension이 설치된것을 볼 수 있습니다. 참 쉽죠잉~ :)

5. Add code to the extension.

아래와 같이 "popup": "popup.html" 을 추가하고, 폴더 내에 popup.html 파일을 생성하면 extension icon을 클릭했을 때, 팝업창으로 popup.html 이 뜹니다. iframe이나 javascript로 내용물을 구현하면 됩니다.

"browser_action": {
    "default_icon": "icon.png",
    "popup": "popup.html"

아래의 스크린샷은 위에 소개된 간단한 수준의 개발 지식으로 만든 extension입니다. popup.html 안에 query문을 form 전송하는 검색창을 넣은것이죠. (그러나 벌써 27 users 나 사용중이네요, 이것이 Global에 위력)


PageRank Implementation Using the BSP

In this post, I'm showing how to implement the PageRank using BSP.
P.S. Apache Hama's BSP framework is not ready yet.

And P.S. again, The pseudo code is developed, based on Java multi-threaded programming. As I introduced before (Hama BSP), the BSP programming is very similar to multi-threaded programming (See BSP serialize printing example). So, the BSP brings a familiar programming model to developers for implementing distributed applications. :-)

Anyway, let's assume that the web-graph G is stored in row sparse format as below:

Vertex: 1 2 3 4 5 6
Index: 0 1 3 4 6 8 10
IncomingEdgeList: 3 1 3 1 5 6 3 4 4 5

– Vertices V are web pages.
– Vertex index[] points to list of incoming edges's vertex.
The PageRank Algorithm is as below:

1) If vi links to vk

– User equally likely to follow any link on page.
– Probability of moving from vi to vk = 1/out_degree(vi).

2) If vi has no outlinks…

– User equally likely to jump to any state.
– Probability = 1 / |V|

3) Weighted moves from each page

– Percentage of moves that follow a link == α.
– Percentage of moves that are jumps to another page == (1- α).

Then, the algorithm can be implemented as below:

BSP main program:

BSPPageRank pagerank[] = new BSPPageRank[vertices.length];
    // Loop over all vertices
    for (i = 0; i < vertices.length; i++) {
      pagerank[i] = new BSPPageRank(i); 

    for (i = 0; i < vertices.length; i++) {

Parallel part:

public void run() {
    double total = 0.0;
    int begin = index[ind];
    int end = index[ind + 1];
    //System.out.println("incoming edges: " + begin +" ~ "+ end);
    // Loop over edges pointing to vertex i.
    for (int j = begin; j < end; j++) {
      int src = j;
      double r = rank[src];
      double incr = r / (double) degree[src];
      total += incr;
    accumulate_rank[ind] = total;

    // Call sync() method at here, if this is run on the BSP cluster

References : MultiThreaded Graph Library (MTGL)

VirtualBox Kernel Errors on Ubuntu

솔직히 은행일만 아니면 깔고싶지 않은 것들인데.. 어쨌거나 설치하면서 만난 에러:

root@edward-desktop:/home/edward# sudo /etc/init.d/vboxdrv setup
WARNING: All config files need .conf: /etc/modprobe.d/bttv.modprobe, it will be ignored in a future release.
 * Stopping VirtualBox kernel module                                                                                                *  done.
 * Recompiling VirtualBox kernel module                                                                                            
 * Look at /var/log/vbox-install.log to find out what went wrong
root@edward-desktop:/home/edward# cat /var/log/vbox-install.log
Attempting to install using DKMS
  removing old DKMS module vboxdrv version  3.1.6

Deleting module version: 3.1.6
completely from the DKMS tree.

Creating symlink /var/lib/dkms/vboxdrv/3.1.6/source ->

DKMS: add Completed.

Error! Your kernel source for kernel 2.6.31-21-server cannot be found at
/lib/modules/2.6.31-21-server/build or /lib/modules/2.6.31-21-server/source.
You can use the --kernelsourcedir option to tell DKMS where it's located.
Failed to install using DKMS, attempting to install without
Makefile:152: *** Error: unable to find the sources of your current Linux kernel. Specify KERN_DIR= and run Make again. 

이럴때 그냥 심플하게 :

# apt-get install linux-headers-`uname -r`
# /etc/init.d/vboxdrv setup

How to Install VirtualBox Guest Additions in Fedora 12

First, in the VM menu (not the Guest but the chrome around it) go to Devices > Install Guest Additions. It will mount a new disc image. Then fire up terminal.

$ su
# yum install kernel-headers kernel-devel gcc
# export KERN_DIR=/usr/src/kernels/
# cd /media/VBOXADDITIONS_3.1.2_56127
# ./

This time the kernel modules should compile. Then restart the system.

Update for 32-bit Guests:

A few possible changes if this doesn’t work for you with a 32-bit guest. (It didn’t for me, so I had to play around/research a bit more.)

# uname -r
If you see the letters PAE, then you’ll need to follow the rest of these steps. If you don’t see PAE, you should be fine.
If so, make sure your kernel is up to date with
# yum update kernel-PAE
. After this, restart.
Instead of the kernel-devel package, you’ll need to install kernel-PAE-devel. That makes the second line of the example above:
# yum install kernel-headers kernel-PAE-devel gcc
If you’d already installed the kernel-devel package, you may want to remove it:
# yum remove kernel-devel
as it can confuse things.
Then, everything else should be the same.

The error message "An unknown error occured" of WordPress MU

It sometimes occur when the folder permissions or group ownership is not allowing the files to transfer over. However, In my case, php configuration was reason. The allow_url_fopen should be ON in PHP configuration (php.ini).

How to enable PHP JSON on Gentoo server?

URL shortener API를 php에서 사용해보려고 하다가 만난 삽질들인데...

CentOS 에서는
pecl install JSON
해주면 되지만, gentoo에서는 뭔가 에러가 잔뜩 나온다. 이때,
USE="json exif" emerge dev-lang/php
명령어로 설치해주면 되더군.

$url = "".urlencode("");
echo $url;
$response = str_replace(");","", str_replace("null(","",file_get_contents($url)));
$json = json_decode($response, true);
echo $json['shorturl'];

Plus, file_get_contents() 함수에서 에러나면,

allow_url_fopen should be ON in PHP configuration (php.ini). Probably, you don’t have access to edit this file and it could also be that your hosting company has restricted changing the allow_url_fopen value. In this case, you should refer to them and ask them to set the value of allow_url_fopen to ON.

P.S. 프로그래밍 언어 중 내가 제일 싫어하는 것들: c, c++, php, shell script

Kick Ass (2010), was funny!!

“I can’t read your mind, but I can kick your ass”

Today, I watched movie Kick Ass (2010). It's directed by Matthew Vaughn, who also directed "Wanted". His movie seems always satisfy the deviant desires of salaryman like me. ★★★★★!!!

When I watched a preview of this movie, it was too babyish. But, there are many scenes of brutality, and it was very impressive to me. Especially, the rescue scene of Kick Ass/Big Daddy part of the final act brought tears to my eyes. (-_-;;)

Installing SciPy

Original article: Installing SciPy

If one couldn't guess from my having built this site using Django, I happen to be a big fan of using Python in various stages of software development. My focus on graphics and image processing requires more robust array and matrix datatypes and their associated operations than what Python includes in its standard library. There are two extensions to Python that provide this functionality: numpy, for efficient, native memory arrays and matrices, and scipy for numerical tools such as solvers, optimization, and Fourier transforms.

While the default installs of these modules are significantly faster than any Python-native implementation, they are still quite slow. The code included in with numpy and scipy to perform this computation is not very efficient. Optimized libraries have been written for the methods that numpy and scipy rely on, so the best of both worlds would be the ease-of-use provided by Python and the performance of tuned architecture-specific math methods, which is what this explains how to install.

This article is a step-by-step set of instructions on how to install the latest Python, numpy, and scipy along with optimized versions of the native code libraries they depend on. It's most useful if your operating system does not have pre-packaged versions available, you'd like to install a potentially faster version of the modules than provided, or you don't have root access to use the package system on your platform.

Questions, comments? Email me:


I'll preface this entire process with the fact that it's a rather large pain in the ass. In many cases, you can get pre-compiled versions of the libraries for your platform. They might not be as optimized as the result of my instructions here, but could serve your needs well enough. If you just want to install them to investigate what they are capable of, I'd forgo the heavy optimized version for now. Basically, if you can avoid compiling all the source and just use packages, I advise doing so.


All the steps outlined below assume you are using an operating system similar to UNIX and have the GNU toolchain available. Unless you are attempting to do this under Cygwin or similar, the steps don't work. All the modules are pre-compiled for Windows, and it's probably your best bet. They are linked against the same ATLAS library we are using, so I imagine they are reasonably fast.


OS X users have a slightly more involved process than Windows users, but end up with a more optimized version of the setup. The fine folks over at have collected all the modules we need into single scipy Superpack for both PowerPC and Intel hardware. These are all built against the ActiveState version of Python 2.4, which you will need to install as well:


Several flavors of Linux contain the numpy and scipy modules in their distro sets. This is covered by the scipy installation docs.

Starting out

OK. Assuming none of the previous options cover your needs, let's get down to business. I'm not entirely sure what the full set of requirements for Python is. It's covered in the documentation on the site, and the README in the source tarball. It installs fine with the standard developer set on my SuSE 10.1 machine. You will definitely need the ability to compile C/C++, which gcc does more than adequately. Also, because numpy and scipy are still under very active development, I grab the bleeding edge version using Subversion, which you will need the client for.

The only requirement that I know of that might not be installed is the g77 portion of the GNU compiler for compiling FORTRAN programs.
(Yes, this is 2007 and we're still using FORTRAN. Lots of linear algebra libraries do. Get over it.)

Having ensured we have these in place, pick a temp location to download and compile our libraries in:

cd /var/tmp
mkdir Python
cd Python

Install Python

The first order of business is to install Python, which is currently 2.5. So, get the latest version and unpack it:

tar xzf Python-2.5.tgz
cd Python-2.5
Python installs using the standard autoconf procedure, so the first step is to configure the source:

./configure --enable-shared --with-threads
If you are installing to a non-standard location, you can alter the directory prefix with the flag:

./configure --enable-shared --with-threads --prefix=$SOMELOCATION
Once that has completed checking for various modules, we compile the source and test to make sure the interpreter is functioning correctly:

make test
Assuming there are no errors in compilation or testing, then install the software:

make install

Path update

If you installed to a different location, and haven't already made use of other software in that location, you'll need to update your $PATH variable to select the new Python install over the standard system install.

In .cshrc add:

or in .bashrc add:

Then type source .bashrc or source .cshrc depending on which you use. And check to make sure you have the right version. Start the Python interpreter and check that the version is 2.5 and location of the executable is the one we installed:

>>> import sys
>>> sys.version
'2.5 (r25:51908, Nov 1 2006, 14:57:46) \n[GCC 4.1.0 (SUSE Linux)]'
>>> sys.executable
Having verified that to be correct, you now have the latest version of Python installed.

Installing FFTW

The next task is to install the Fastest Fourier Transform in the West, better known as FFTW. This library provides efficient Fourier transforms, which I shouldn't have to tell you are very useful for signal processing tasks.

Returning to our top directory, we grab that code

cd /var/tmp/Python
tar xzf fftw-3.1.2.tar.gz
cd fftw-3.1.2
We need to build the code twice, once to create the single precision library, and once to create the double precision library. We'll start with the default, double precision:

./configure --enable-shared --enable-sse2 --enable-portable-binary \
Or add the --prefix flag to put it somewhere else:

./configure --enable-shared --enable-sse2 --enable-portable-binary \
--enable-threads --prefix=$SOMELOCATION
We then build the software and install it

make install
Now, we repeat the exercise for the single precision library, adding the --prefix argument if needed:

./configure --enable-shared --enable-sse --enable-float \
--enable-portable-binary --enable-threads
And once again build the software and install it

make install
Having completed that, we now have a working version of FFTW.

Install ATLAS and LAPACK

The second library to install provides efficient routines for linear algebra, and is a bit more complicated than the FFTW. The library consists of two parts:

  • BLAS (Basic Linear Algebra Subprograms) which covers basic vector-vector, matrix-vector, and matrix-matrix operations
  • LAPACK (Linear Algebra PACKage) which provides higher-lever routines like solvers and eigenvalue decomposition

The source of LAPACK you can download from the site contains an implementation of BLAS, but that BLAS version does not have very well optimized routines. Instead, we use a different BLAS implementation called ATLAS which can automatically tune itself for whichever platform we are compiling it on. So, it ends up being a somewhat complicated method of building the full numerical package LAPACK, but replacing the parts of BLAS and LAPACK with ones provided by ATLAS, which are considerably more efficient.


Go back to the top level temp directory if you install Python

cd /var/tmp/Python
Get the LAPACK source

tar xzf lapack.tgz
cd lapack-3.1.0
and build the library after copying the appropriate Makefile into place

make lapacklib
Having compiled the static LAPACK library, copy it to the same library name as the ATLAS library we will make in the next part:

cp lapack_LINUX.a liblapack.a


You have 2 options on building ATLAS: compiling the source yourself and tuning it to your platform, or grabbing a pre-built version for your hardware architecture. While the compiling and tuning ATLAS will result in the absolute best performance, it also takes several hours to complete. I'll cover the latter option, as it's worked well enough for me. If you choose to build it from scratch, it's pretty similar in nature. From the ATLAS site:

grab the Linux binary for your architecture and drop it in the same temp location you made. As of current, the latest stable build is version 3.6.0, and I'll assume this is being done on a Pentium 4 class machine with SSE2 instructions.

Unzip the library:

tar xzf atlas3.6.0_Linux_P4SSE2.tar.gz

Combine ATLAS and LAPACK

From here, we need to integrate the partial ATLAS LAPACK implementation with our full LAPACK implementation. We create a temporary directory, extract all the object files from the partial LAPACK implementation contained in ATLAS.
cd Linux_P4SSE2/lib
mkdir tmp
cd tmp
ar x ../liblapack.a
We then overwrite the object files in the static library of LAPACK we created previously with the ones from the ATLAS library:

cp ../../../../LAPACK/liblapack.a ../liblapack.a
ar r ../liblapack.a *.o
cd ..

Install the result

Having created the full ATLAS/LAPACK hybrid, copy it to the necessary location:

cd ..
cp include/* /usr/local/include
cp lib/* /usr/local/include
or if you don't have root, you can use the same path as you did for your Python install

cp include/* $SOMELOCATION/include
cp lib/* $SOMELOCATION/lib

Installing numpy

Alright. Now we have all the parts required to build numpy, so lets check that out of it Subversion repository. From the top main directory of this working space, to get the latest version of numpy:

cd /var/tmp/Python
svn co numpy
cd numpy
We'll need to instruct it where to find the ATLAS/LAPACK hybrid and FFTW libraries we made, so we need to edit the site config file:

pico site.cfg
and add the text

library_dirs = /usr/local/lib
atlas_libs = lapack, f77blas, cblas, atlas

library_dirs = /usr/local/lib
include_dirs = /usr/local/include
fftw3_libs = fftw3, fftw3f
fftw3_opt_libs = fftw3_threads, fftw3f_threads
where library_dirs is where you copied the ATLAS/LAPACK and FFTW library files to, and similarly include_dirs is where the include files for FFTW ended up. If you installed in the default location, it's prefix is /usr/local, otherwise it's whatever you have been using for $SOMELOCATION.

From here we need to build numpy:

./ build
It will dump out a lot of configure and build results. Near the top of the resulting contents you should see text notifying you that it found the ATLAS/LAPACK files to the effect of:

libraries = ['lapack', 'f77blas', 'cblas', 'atlas']
library_dirs = ['/usr/local/lib']
language = c
If that is found, we can install the library:

./ install
Assuming you have the command python associated with the proper install on your system (if there are several) it will install it in the proper place.

Next, we go about testing our install. Due to the way python imports modules, if it sees the local numpy directories it will try to use those over the ones installed. This will do the wrong thing, and cause a massive headache trying to figure out why nothing works right. For sanity's sake, the easiest way to avoid this very confusing outcome is to switch to some other directory entirely:

cd /
So, run python and on the python shell enter:

>>> import numpy
>>> numpy.test( 10 )
And it should display some text ending in something like:

Ran 530 tests in 2.093s

Congratulations. You now have a working version of numpy.

Installing scipy

Installing scipy follows in roughly the same process as that of numpy. In fact, all the hard work has been done in the site.cfg we make for numpy, as scipy scans the same file to learn where its libraries are

Returning to our working directory, we check out scipy:

cd /var/tmp/Python
svn co scipy
cd scipy
Since it reads the site.cfg file that numpy conveniently moved into its own install directory, we can just run setup

./ build
It will spit out a lot of text. Assuming that the entries in the site.cfg are correct, you should see a piece of text similar to:

libraries = ['fftw3', 'fftw3f']
library_dirs = ['/usr/local/lib']
define_macros = [('SCIPY_FFTW3_H', None)]
include_dirs = ['/usr/local/include']

libraries = ['lapack', 'f77blas', 'cblas', 'atlas']
library_dirs = ['/usr/local/lib']
language = c
If that is found, we install the library:

./ install
Assuming you have the command python associated with the proper install on your system (if there are several) it will install it in the proper place.

We now test the install once again moving to somewhere to avoid import confusion:

cd /
So, run python and on the python shell enter:

>>> import scipy
>>> scipy.test( 10 )
This time, it'll spit out a lot of text for all its test results. Once done you'll get something like:

Ran 1618 tests in 67.706s

Don't freak out too badly if you get a couple failures. They usually deal with numerical precision issues. Depending on your needs this may or may not be a problem. I'm not an expert on how to go about fixing them. I've never had any problems with them in my work.

Assuming you're pleased with the outcome, you now have working, optimized copies of numpy and scipy.

Deleting files that have a minus symbol as file name

아래와 같은 (-) 심볼을 갖은 파일이 생겨서 지우고 싶었는데, 도저히 지울수가 없었다.

-rw-r--r--   1 edwardyoon other          0 Apr 12 01:43 --no-check-certificate
-rw-r--r--   1 edwardyoon other          0 Apr 12 01:46 -S

-bash-3.00$ rm -rf **certificate
mv: illegal option -- no-check-certificate

지우고 싶은데 명령어 option 으로 인식되니까.. 그런데 나도 참 바보다. (-_-;)
아래처럼 지우면 되는 것을.

-bash-3.00$ rm -rf ./-S

A Multi-Threaded Pi Estimator

Hadoop 의 Map/Reduce 로 Pi estimator가 구현되어있는데,
이를 Hama, BSP로 처리하면 M/R과 어떠한 성능/코드복잡도 차이를 보여줄까?
물론 안봐도 비디오로 예측되는 바, 이 예제로는 딱히 큰 매리트가 없을것도 같으나..

어쨌건 연습삼아 아래와 같은 Pi 계산 알고리즘을 multi-threaded 프로그래밍을써서 간단하게 만들어 봤다.

iterations = 10000
circle_count = 0

do j = 1,iterations
  generate 2 random numbers between 0 and 1
  xcoordinate = random1
  ycoordinate = random2
  if (xcoordinate, ycoordinate) inside circle
  then circle_count = circle_count + 1
end do

PI = 4.0*circle_count/iterations

이를 자바로 구현해보면 아래와 같다.

public class PiEstimator {
  private double pi = 0.0;
  private final int numTasks = 10;
  private int allFinished = 0;
  private long starttime = 0;

  class PiEstimatorTask extends Thread {
    private PiEstimator Parent = null;
    private static final int iterations = 100000;

    public PiEstimatorTask(PiEstimator Parent) {
      this.Parent = Parent;

    public void run() {
      int in = 0, out = 0;
      for (int i = 0; i < iterations; i++) {
        double x = 2.0 * Math.random() - 1.0, y = 2.0 * Math.random() - 1.0;
        if ((Math.sqrt(x * x + y * y) < 1.0)) {
        } else {
      double estimate = 4.0 * (double) in / (double) iterations;
      ((PiEstimator) Parent).sync(estimate);
  public synchronized void sync(double est) {
    long rt = System.currentTimeMillis() - starttime;
    System.out.println("Terminated at " + rt + " ms, est " + est);
    pi = (allFinished == 0) ? est : (pi + est) / 2;

  public double getPi() {
    return pi;

  public void run() throws Exception {
    PiEstimatorTask[] PiTask = new PiEstimatorTask[numTasks];
    System.out.println("Instantiating " + numTasks + " threads");
    for (int i = 0; i < numTasks; i++) {
      PiTask[i] = new PiEstimatorTask(this);
    starttime = System.currentTimeMillis();
    System.out.println("Starting threads, time = 0 ms");
    for (int i = 0; i < numTasks; i++) {
    for (int i = 0; i < numTasks; i++) {

  public static void main(String[] args) throws Exception {
    PiEstimator pi = new PiEstimator();;
    System.out.println("Final estimate is: " + pi.getPi());

Lucene n-gram 테스트 코드와 간단 해설

Lucene이 예전엔 몇몇 언어들을 위한 단순 명사 추출, 어절 추출, 띄어쓰기 수준의 것들만 제공했기 때문에 한글같은 경우엔 검색 품질이 영 꽝이었다. "한글 검색하려면 어떻게 하나요?" 라는 문장에서 "검색" 만 추출해서 색인하는게 뭔가 별도의 한글 관련 library 가 없으면 불가능했다라는 얘기. 그러나 최근 버전은 (이것도 꽤 오래전 얘기지만) 한글 명사, 조사, 불용어 제거 등등 고급 parsing 까진 아니더라도 n-gram tokenization 방식으로 어느 정도 해결이 가능케 해놨다.

아래의 코드는 3.0.1 을 다운받아서 sample로 구현한것인데 결과를 먼저 보자.
"아버지가방에들어가신다" 를 색인하면 단어 n개의 연쇄를 추출해서 색인하는데, 이 때문에 "아버지"를 검색하면 걸려든다. 물론 단점으로는 "가방"을 검색해도 걸려든다. (-_-;)

Optimizing index...
188 total milliseconds
Term: content:가방
Term: content:가신
Term: content:들어
Term: content:방에
Term: content:버지
Term: content:신다
Term: content:아버
Term: content:어가
Term: content:에들
Term: content:지가
Term: seqid:2
Searching for: 가방
1 total matching documents
My seq ID: 2

이것도 몇 년만이라 20분정도를 소요했다. (-_-;;)

public void testLucene() {
    try {
      File index = new File("index");
      Date start = new Date();

      IndexWriter writer = new IndexWriter(,
          new CJKAnalyzer(Version.LUCENE_30), true,
          new IndexWriter.MaxFieldLength(1000000));

      Document doc = new Document();
          .add(new Field("seqid", "2", Field.Store.YES,
      doc.add(new Field("content", "아버지가방에들어가신다", Field.Store.YES,

      System.out.println("Optimizing index...");

      Date end = new Date();

      System.out.print(end.getTime() - start.getTime());
      System.out.println(" total milliseconds");

      IndexReader reader =
 File("index")), true);
      TermEnum termEnum = reader.terms();

      while ( == true) {
        Term term = termEnum.term();
        System.out.println("Term: " + term);

      Searcher searcher = new IndexSearcher(reader);
      Analyzer analyzer = new CJKAnalyzer(Version.LUCENE_30);
      QueryParser parser = new QueryParser(Version.LUCENE_30, "content",
      System.out.println("Searching for: 가방");
      Query query = parser.parse("가방");

      TopScoreDocCollector collector = TopScoreDocCollector.create(50, false);, collector);
      ScoreDoc[] hits = collector.topDocs().scoreDocs;

      int numTotalHits = collector.getTotalHits();
      System.out.println(numTotalHits + " total matching documents");
      collector = TopScoreDocCollector.create(numTotalHits, false);, collector);
      hits = collector.topDocs().scoreDocs;

      for (int i = 0; i < hits.length; i++) {

        Document docss = searcher.doc(hits[i].doc);
        String path = docss.get("seqid");
        System.out.println("My seq ID: " + path);

    } catch (Exception e) {


Problem with Zend Gdata and include path

Today, I spent a lot of time to install the Zend Gdata. I couldn't passed the InstallationChecker.php with below message:

Zend Framework Installation Errors :

Exception thrown trying to access Zend/Loader.php using 'use_include_path' = true. Make sure you include Zend Framework in your include_path which currently contains: .:/usr/share/php:/usr/local/src/Zend-Gdata/library

I thought that it's a problem related to configurations and 'include_path', but It was simply permission problem. Solution?

Make sure httpd can read Zend/Loader.php. That's all. :/

Challenges Of Life

Now i'm downloading the BBC Documentaries, titled "Challenges of life". I didn't watched yet but i knew, the life is an endless chain of challenges.

It is common to all life on earth. I also still challenge (just to survive), reading book called Status Anxiety (Alain de Botton). :/

Blogger 모바일 웹 꾸미기

Blogger 이놈들은 아직까지 Mobile Web 버전을 제공하지 않고 있습니다. 하지만 깔끔하게 정리할 수 있는 꽁수가 있지요. :)

바로 mobile feed reader 를 사용하는겁니다. Layout에 Edit HTML툴을 사용해서 header 태그 사이에 아래와 같이 입력합니다:

if(navigator.platform == 'iPhone') {

그러면 모바일에선 아래와 같이 출력됩니다.

Installing php5.2 on CentOS,

For enabling it run the following commands:

# vi /etc/yum.repos.d/utterramblings.repo

name=Jason's Utter Ramblings Repo

After just update your php

#yum update php

PHP caching

$cachefile = 'cache/filename.cache';
$cachetime = 60; // 60 sec
// Serve from the cache if it is younger than $cachetime
if (file_exists($cachefile) && (time() - $cachetime < filemtime($cachefile))) {
 echo "<!-- Cached ".date('jS F Y H:i', filemtime($cachefile))." -->";
ob_start(); // start the output buffer

your HTML / normal php code here.

$fp = fopen($cachefile, 'w'); // open the cache file for writing
fwrite($fp, ob_get_contents()); // save the contents of output buffer to the file
fclose($fp); // close the file
ob_end_flush(); // Send the output to the browser

Problems of Tomcat Session Clustering

Today, I tryied to session clustering between two tomcat servers, reading cluster-howto document for tomcat-6.0.

My system is as below:
Server1: (HTTPD + tomcat1)
Server2: (only tomcat2 w/o HTTPD)

    Apache HTTPD
        /          \
      /              \
 tomcat1      tomcat2

However, I'm not familiar with web programs, so it was difficult and spent a lot of time. (-_-;;) But, I finally succeed in configuring. There was some problems with firewall settings and route configurations. I'd like to share them with you.

1) firewall settings

Check that multicast port is on your UDP open list and the receiver TCP port is also for both machines open! I added below list to iptables.

-A RH-Firewall-1-INPUT -p udp --dport 45564 -d -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp --dport 4000 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 4000 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 45564 -j ACCEPT

2) network interface for multicast

# route add -host dev eth0
# netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface UH        0 0          0 eth0   U         0 0          0 eth0     U         0 0          0 eth0         UG        0 0          0 eth0

Driving on Bundang-Suseo highway

Recently, I'm on vacation. The video was recorded when I went to a mart and bought some computer stuff.

The scene looked like a road of cherry blossoms. The computer was in my trunk but, I couldn't control my speeding instinct. :)

장자의 진인(眞人) 철학 - 목계지덕

어느 왕이 투계를 몹시 좋아하여 뛰어난 싸움닭을 들고 기성자란 당시 최고의 투계 사육사를 찾아가 최고의 투계로 만들어 달라고 부탁했다. 열흘이 지난 뒤 왕이 기성자에게 물었다. “닭이 충분히 싸울 만한가?” 기성자는 이렇게 대답하였다. “아닙니다. 아직 멀었습니다. 닭이 강하긴 하나 교만하여 아직 자신이 최고인 줄 알고 있습니다. 그 교만을 떨치지 않는 한 최고의 투계라 할 수 없습니다.”

열흘 뒤 왕이 또 물었을 때 기성자는 이렇게 대답하였다. “아직 멀었습니다. 교만함은 버렸으나 상대방의 소리와 그림자에도 너무 쉽게 반응합니다. 태산처럼 움직이지 않는 진중함이 있어야 최고라 할 수 있습니다.”

열흘이 지난 뒤 왕이 다시 묻자 그는 “아직 멀었습니다. 조급함은 버렸으나 상대방을 노려보는 눈초리가 너무 공격적입니다. 그 공격적인 눈초리를 버려야 합니다.”

또 열흘이 지난 뒤 왕이 묻자 “이제 된 것 같습니다. 상대방이 소리를 질러도 아무 반응을 보이지 않고 완전히 마음의 평정을 찾았습니다. 나무와 같은 목계(木鷄)가 되었습니다. 닭의 덕이 완전해졌기에 이제 다른 닭들은 그 모습만 봐도 도망갈 것입니다.”

장자는 이야기에서 최고의 투계는 목계(木鷄)라는 것을 말하고자 한다. 그리고 목계가 되려면 세 가지 조건이 있다.

첫째, 자신이 제일이라는 교만함을 버려야 한다. 자신이 최고라고 으스대는 사람이 배워야 한다.
둘째, 남의 소리와 위협에 쉽게 반응하지 않아야 한다. 누가 뭐라고 하면 쉽게 반응하고 화를 내는 사람이 배워야 한다.
셋째, 상대방에 대한 공격적인 눈초리를 버려야 한다. 누구든 싸우고 경쟁하려고 하는 사람이 배워야 한다.

목계는 인간으로 말하면 완전한 자아의 성취와 평정심을 이룬 사람의 모습이라 할 수 있다. 내가 가지고 있는 특별한 광채와 능력을 상대방에게 드러내지 않기에 그 빛은 더욱 빛날 수 있다. 나무로 만든 닭처럼 평정을 유지할 수 있기에 남들이 쉽게 도발하지 못한다. (박재희 글)

엎질러진 물

중국에 공부는 좋아하지만
일은 전혀 않는 여상이라는 남자가 있었단다.

여상에겐 아내가 있었는데 정작 그 여상이란놈은 책만 읽지,
일은 하나도 안해 집은 찢어지게 가난했던거야.

정나미가 떨어진 마누라는 결국 집을 나가고 말았지.
그로부터 얼마 후...,

여상은 왕에게 능력을 인정받아 곧 크게 출세하게 돼.
그때 홀연히 가출했던 마누라가 돌아오지.
다시 인연을 회복하고 싶다면서...,

그런데 여상은 묵묵히 그릇에 물을 퍼 갖고 나와 뜰 앞에다 쏟았어.
그리고, ... '그럼 그 물을 그릇에 다시 돌려놓아 보시오' 라고 했어.
하지만 물은 이미 땅에 스며들어 퍼담을 수 없었지...

그러자 여상은 말했어.
'한번 엎질러진 물은 원래 그릇에 되담을 수 없는 법이오.' 라고.


슈퍼클래스라는 조나단 '재미없는' 책을 조용히 읽다가, 간만에 마음에 드는 문구를 발견한다.

한 인간의 됨됨이를 시험해 보려거든 그에게 권력을 줘 보라 - 링컨

아인슈타인도 ○○○ 때문에 괴로워했다.

(::1915년 아인슈타인이 친지에 보낸 편지서 “박봉 시달리고 동료 스트레스에 괴롭다” 푸념::) 

“요즘 나는 비인간적인 조건에서 일하고 있다네. 늘 초과근무에 시달리고 있지. 동료 과학자들은 내 이론에 흠집을 내려하거나 나보다 먼저 연구를 완성시키기 위해 경쟁하는 등 밉살스럽게 행동한다네.”

직장경험이 있는 사람은 심히 공감 할 것 이다.
특히 대기업, R&D 분야, 조직 내 비주류에 속하거나 자라나는 새싹들.
초과근무의 직·간접적 강요, 실적 가로채기, 흠집내기.. 어휴~ 넌저리 나지.

내 경우도 보면 가관인게, 유리한 위치를 이용해먹는건 당연하다고 말한 사람도 있었다. ㅋ
물론, 대놓고 속내를 보여준 그 사람은 뻔뻔하게 헛소리하는 것 보다 차라리 나은 사람 중 한명이다.

그러나, 아인슈타인이 살아가던 그 세상 또한 우리에게 펼처진 것과 같았다는 사실에 주목하자.

2010 HPC trends and Hama project

Obviously, the HPC (High Perfomance Computing) and Scientific-Computing market is expected to continuously growing. According to IDC, the current HPC market is around $10 billion, which is 20% of the total server market. The research company has forecast the HPC market to grow to $15.6 billion by 2012.

By the way, In my opinion, currently, the non-IT company (e.g., chemistry, bio-medical, ... , etc) needs these HPC technologies, rather than web service IT company. Because, innovation of web service doesn't always require a high degree of skill or scientific computing. (Of course, there is some demands from the part of graph/network data processing in web service IT company)

For this reason, currently I consider to implement the Hama as a solution aimed at small HPC market.

Global-Scale Web Services 와 기반 Technologies

예전에 Facebook 에서 어마어마한 동접처리를 위해서 erlang을 사용한다는 얘길 들었다. 또, twitter에서 사용한다는 ejabberd 라는것도 있다. 이런건 대략 distributed / decentralized P2P system like 한 messenger server 인데, 이런건 왜 쓰냐고?

팔로우가 많은 사람은 수백에서 수천, 수만인데.. nearly realtime 을 위해서는 이런거 없이 기술적으로는 여럿에게 메시지 전달이 힘들다.

구글 Buzz가 사용한다는 유사품 pubsubhubbub도 있다.

바야흐로, 세계 Major 웹 서비스 회사들은 Global-Scale, 그 어마어마한 데이터 처리와 real-time service 를 위해, Big Data Storage Systems, Fault Tolerant Architectures, High Scalability, NoSQL 등등 기반 웹 기술을 개발하는데에 집중하고 있다.

Interesting project, hama-mrcl (Map/Reduce + CUBLAS)

I just found interesting project

They tried to perform the matrix multiplication using MapReduce and CUBLAS. To avoid I/O bottlenecks during multiplication processing, a blocking/tiling algorithm was used based on M/R and, CUDA BLAS library (CUBLAS) was used for GPU acceleration in local computations. CUBLAS is a BLAS library ported to CUDA, which enables the use of fast computing by GPUs without direct operation of the CUDA drivers.

The interesting report is at this research, Pure java is better/faster when input (a split, or a sub-matrix in distributed system) is small.

So, .. Perhaps it's not fit with distributed system, which is consist of a lot of nodes. But, I roughly guess that the GPU technology could be useful for future BSP concept of Apache Hama.

I'm not BSP expert yet, but I really love this phrase: "the BSGP program always has a significantly lower code complexity" from Bulk–Synchronous GPU Programming.

Talkers vs. Doers

There's two kinds of people in this world when you boil it all down. You've got your talkers and you've got your doers. Most people are just talkers. All they got is talk. But when all is said and done, it's the doers who change this world. And when they do that, they change us. And that's why we never forget them.

So, which one are you? Do you just talk about it? or do you stand up and do something about it? Because believe you me, all the rest of it is just coffeehouse bullshit.

Facial Symmetry

Symmetry, especially facial symmetry, is one of a number of aesthetic traits, including averageness and youthfulness, associated with health, physical attractiveness and beauty of a person or non-human animal according to the authors of Facial Attractiveness: Gillian Rhodes, Leslie A. Zebrowitz.[2] It is also hypothesized as a factor in both interpersonal attraction and interpersonal chemistry. [Wikipedia]

Human beings always pursuing beautiful things. What does beauty mean to you? IMO, the beauty is health. I guess, that emotion is for healthy breeding instinct.

A simple example is a facial symmetry.
We all know that facial symmetry is a important factor in human beauty, many researches have shown that lack of facial symmetry on average correlates with lower beauty rankings. Why Human beings feel beauty from symmetry and balance? AFAIK, the asymmetry of face is affected by spinal curvature. It means that they are unhealthy.

I've never seen animals who have asymmetry of face in the wild nature. There is a rule of jungle - "removal of the unfit and unhealthy".

See also: Beautiful people are more intelligent

Carl Friedrich Gauss

Johann Carl Friedrich Gauss (pronounced /ˈɡaʊs/; German: Gauß listen (help·info), Latin: Carolus Fridericus Gauss) (30 April 1777 – 23 February 1855) was a German mathematician and scientist who contributed significantly to many fields, including number theory, statistics, analysis, differential geometry, geodesy, geophysics, electrostatics, astronomy and optics. Sometimes known as the Princeps mathematicorum (Latin, "the Prince of Mathematicians" or "the foremost of mathematicians") and "greatest mathematician since antiquity", Gauss had a remarkable influence in many fields of mathematics and science and is ranked as one of history's most influential mathematicians. He referred to mathematics as "the queen of sciences."

Gauss was a child prodigy. There are many anecdotes pertaining to his precocity while a toddler, and he made his first ground-breaking mathematical discoveries while still a teenager. He completed Disquisitiones Arithmeticae, his magnum opus, in 1798 at the age of 21, though it would not be published until 1801. This work was fundamental in consolidating number theory as a discipline and has shaped the field to the present day. [Wikipedia]

천재 수학자 Gauss. 수학사나 기타 분야의 천재 사례들을 보면 Gauss 처럼 어릴때부터 두각을 나타낸 인물은 많지 않습니다. 이런 신동-류에는 물론 Gauss 외에도 Pascal 같은 수학자가 있지만, 그는 여차저차 젊은 나이에 생을 마감합니다. (Gauss가 주인공이라 Pass)

Gauss하면 유명한 일화가 있죠.
초딩인 그를 지도하던 뷔트너는 좀 쉬려고 학생들에게 1부터 100까지 합을 구하라 시켰더니,
Gauss가 5050이란 답만 작성해서 바로 제출하는 예상치 못한 사건이 벌어집니다.

어떻게 풀었느냐는 질문에,
1 + 2 + 3 + … + 98 + 99 + 100 = S 라고 하고, 100 + 99 + 98 + … + 3+ 2 + 1 = S 이니까,
2S = 101 + 101 + … + 101 = 101 × 100, 그래서 S = 101 × 50 = 5050 라고 깔끔하게 정리해줍니다. ㅎㅎ

이게 우리가 고딩때 배우게되는 등차수열의 합입니다.
저는 초딩때, 99단을 해메고 있었지요 아마. ㅋ

(그러나, Gauss 또한 성격은 안좋았다고... )

그리고 그는 언제나 "말로 설명하기전에 계산은 이미 끝났다" 라고 했다합니다.
북두신권 말투였다면 얼마나 재밌었을까요? (-_-;;)

Gerolamo Cardano - 1

The mathematics of games and gambling 이란 책을 보다가
중간 까메오로 출연하신 Cardano (카르다노) 에 대해 다시 관심을 갖게 되었다.

과거엔 그냥 남의 업적을 가로챈 '미치광이 + 타짜' 캐릭터로만 알고 있던 사람인데,
왠지 나와 너무도 비슷한게 마음에 걸려 곰곰히 빠져들은것이다.

우선 그는 굉장히 outspoken and highly critical 했다고 한다.
직설적이고 비관적인 성격. (바로 내가 주변에서 자주 듣는 얘기 -_-)
요약하면 그냥 주변 사람과 어울리지 못하고 세상과 잘 타협하지 못하는 사람이다.
Wikipedia 에 정리된 그의 bio를 좀더 읽어내려가면 도박과 난데없는 점성술, 자살..
슬슬 별게 다 나온다.

이쯤되면 보통, 이 사람 정체가 뭘까? 라는 생각을 갖게 마련.
'사생아', '성장배경' 등을 걸고 넘어지면 의문은 쉽게 풀리지만,
내 경우를 봐서도 그렇고, 그게 꼭 환경적 요인에 지배됬다고는 할 수 없는것 같다.
어쩔수없는 '천성'이지. ㅋ

시리즈물로 가려고 제목을 Gerolamo Cardano - 1 로 했다.
밤이 늦었으니 다음글에서 Cardano의 games of chance, cubic equation 등등 ..
(자서전에 픽션좀 가미해서 그의 인생 드라마도 한번.. )
좀더 연재할 것을 예약하며 .. See u soon.

"내가 지금 바라는 것은 휴식이다."

FW: Apache Hama in academic paper

HAMA: An Efficient Matrix Computation with the MapReduce Framework

Sangwon Seoyz, Edward J. Yoon, Jae-Hong Kimy, Seongwook Jiny, Jin-Soo Kimx and Seungryoul Maengy
y Computer Science Division, Korea Advanced Institute of Science and Technology (KAIST)
z Computer Science Division, Berlin University of Technology (TU Berlin)
x School of Information and Communication, Sungkyunkwan University, South Korea
User Service Development Center, NHN Corp., South Korea
fswseo, jaehong, swjin,,,


Various scientific computations have become so complex, and thus computation tools play an important role. In this paper, we explore the state-of-the-art framework providing high-level matrix computation primitives with MapReduce through the case study approach, and demonstrate these primitives with different computation engines to show the performance and scalability. We believe the opportunity for using MapReduce in scientific computation is even more promising than the success to date in the parallel systems literature.

It is sooner than I'd planned, but nice start. I hope that this project will continue in the future. :)

comScore Reports Global Search Market Growth of 46 Percent in 2009

Google Sites Accounts for Two-Thirds of 131 Billion Searches Conducted Worldwide in December while Introduction of Bing Helps Microsoft Post Significant Gains During the Year

Reston, VA, January 22, 2010 – comScore, Inc. (NASDAQ: SCOR), a leader in measuring the digital world, today released a study on growth in the global search market in 2009. The study revealed that the U.S. remains the largest search market worldwide, while Google Sites retains a commanding position in the global search market.
“The global search market continues to grow at an extraordinary rate, with both highly developed and emerging markets contributing to the strong growth worldwide,” said Jack Flanagan, comScore executive vice president. “Search is clearly becoming a more ubiquitous behavior among Internet users that drives navigation not only directly from search engines but also within sites and across networks. If you equate the advancement of search with the ability of humans to cultivate information, then the world is rapidly becoming a more knowledgeable ecosystem.”
Top Search Markets Worldwide
The total worldwide search market boasted more than 131 billion searches conducted by people age 15 or older from home and work locations in December 2009, representing a 46-percent increase in the past year. This number represents more than 4 billion searches per day, 175 million per hour, and 29 million per minute. The U.S. represented the largest individual search market in the world with 22.7 billion searches, or approximately 17 percent of searches conducted globally. China ranked second with 13.3 billion searches, followed by Japan with 9.2 billion and the U.K. with 6.2 billion. Among the top ten global search markets, Russia posted the highest gains in 2009, growing 92 percent to 3.3 billion, followed by France (up 61 percent to 5.4 billion) and Brazil (up 53 percent to 3.8 billion).
Top 10 Countries by Number of Searches Conducted*
December 2009 vs. December 2008
Total Worldwide, Age 15+ - Home & Work Locations
Source: comScore qSearch

Searches (MM)
Percent Change
United States
United Kingdom
South Korea
Russian Federation
*Searches based on “expanded search” definition, which includes searches at the top properties where search activity is observed, not only the core search engines.
Top Search Properties Worldwide
Google Sites ranked as the top search property worldwide with 87.8 billion searches in December, or 66.8 percent of the global search market. Google Sites achieved a 58-percent increase in search query volume over the past year. Yahoo! Sites ranked second globally with 9.4 billion searches (up 13 percent), followed by Chinese search engine Baidu with 8.5 billion searches (up 7 percent). Microsoft Sites saw the greatest gains among the top five properties, growing 70 percent to 4.1 billion searches, on the strength of its successful introduction of new search engine Bing. Russian search engine Yandex also achieved considerable gains, growing 91 percent to 1.9 billion searches.
Top 10 Search Properties by Searches Conducted
December 2009 vs. December 2008
Total Worldwide, Age 15+ - Home & Work Locations
Source: comScore qSearch

Searches (MM)
Percent Change
Google Sites
Yahoo! Sites
13% Inc.
Microsoft Sites
NHN Corporation
Ask Network
43% Corporation
*Searches based on “expanded search” definition, which includes searches at the top properties where search activity is observed, not only the core search engines.

무한의 세계

무한 집합의 크기 Cardinality , 즉 원소의 개수를 수학에서는 '농도'라고 말한다. 유한 집합의 크기는 그대로 원소의 개수 이지만, 무한 집합의 경우는 원소의 개수를 낱낱이 셈하는 것은 불가능하기 때문에 '농도'라...