Showing posts from February, 2012

Big Data, Why Matrix is important?

We feels the beauty of harmony and convenience of order from regular array that can be found in the Library or Parking lot. Like this, the matrix is applied not only to mathematical problems but also to problems in our real life as a useful concept. For examples, account book, items or goods management, encryption and decryption, population analysis, statistical data analysis, quantitative business analysis, and the transportation network analysis, ..., etc.

The matrix is everywhere, it is all around us.
The same is true of the Cyber world. The matrix is an essential part of information management and analysis. Just think of Amazon bookstore, Foursquare, Google Maps/Places, Social network services and its traffic flow networks or user in/out flows, ..., etc. Log data. The only difference is scale, Local Vs. World-wide. In shortly, Big Data! Do you love this term?

Wait! What is Matrix?

In mathematics, the matrix is an rectangular array of numbers or letters arranged in rows and columns. …

Pregel clone package on top of Apache Hama

Today, I finished testing new Graph package and its examples of Apache Hama on 2 rack 512 cores fully distributed cluster.

The new Graph APIs is the completely clone of Google's Pregel and its performance  is also quite good. Hama-0.5 release will provide really powerful BSP computing engine and lot of new features. :D

 Here's full source code of Single Source Shortest Path:
/** * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed o…

My guitar playing


Received new car as compensation

I was having (engine stalls) problem with ma car 2011 Z4 sDrive30i, finally received new one as compensation from the BMW. Yahoo!

The model of my new car is 2012 new Z4 sDrive 35i with twin turbo, 7-speed dual clutch transmission, alpine white body, red seats, full sound package ..., etc. options.

I dislike turbo engine (more precisely, turbo-lag) but this is quite good, except crackable alloy wheels (you must watch out for all potholes nervously) and rough engine braking (the car behind might kick your ass if you take off the gas pedal suddenly).

This car has an awesome performance, unbelievable quick response, new +/- paddle shifts, and back-fire sounds like beast... (If you listen very closely, you may hear the sound of Benz SLS amg). Overall, ★★★★☆.

Compared to (1st generation) older model Z4, this car got convenience and enough output but lost sparky and sprinter's explosive power. Shortly, this is a sporty-daily car. If you're looking for hardcore machine, you should hav…

Cassandra 책 속에 내 이름

아 못난이같지만 자랑질 좀 하자.
카싼드라 책 속에 내이름이 있음을 오늘 처음 발견했다. ㅋ

내가 글로벌 최대 출판사와 IT전문서적을 publish한 최초의 한국인이 될테니.

Stability equals death

Biologically, being alive means keeping instability. Our cells pump out sodium (Na) and take in potassium (K) until they die. Between start and end of all things, there's only instability.

Why don't we have to pursue more instable life?
(But, ... I seems not ready to give up my stable life yet.)

빅데이터는 노하우의 내재화가 핵심이다.

내가 지금껏 IT 업계에서 봐온 짜증-류의 작업은 크게 3개 정도 있다.

1) 첫째가 새벽에 출근해서 DB 만지는 것.

 가령, 블로그 서비스에 (사소할지언정) 어떤 기능이 하나 추가되거나 기획자들이 리포트를 원할 때면 필연적으로 RDBMS 스키마를 변경하거나 묵직한 쿼리를 날려야 되는 문제가 따라온다. 그러면 그냥 새벽에 ‘임시점검’ 띄워놓고 DB 작업하는 거다. 데이터가 증가하거나 장애가 뜨면 또 어떤가. 바로 이런 짜증스런 문제에서 Schema-free, ad-hoc query processing, fault tolerant 요구가 나오고 NoSQL 기술이 진화하는 것이다.

2) 두 번째, 웹 서버에 웹 로그 파일 4GB 짜리가 수십 개씩 뚝뚝 떨어진다.

 로그파일 떨어지는걸 감당못해 바로바로 압축하고 테이프에 떠서 지워가는 곳도 있을거다. 이 때, 어떤 장애가 발생하면 당근 과거 로그는 뒤져볼 수 가 없겠고, 로그레벨을 debug로 맞춰서 재현될 때까지 멍청하게 눈팅 하는거다. 그래서 거대한 분산 파일시스템, 로그 마이닝 같은 기술에 열광하는게 아닐까. 잡설 1, 미국 어느 주에서는 Facebook, Twitter 타임라인가지고 crime prediction 하기도 하고 (왠지 자살같은것도 미연에 방지할 수 있겠고) 그런다던데 ... 한국은 왠지 알바생들이 나꼼수 트위터 눈팅할 듯.

3) 세 번째, 의사결정권자는 언제나 근거자료를 원한다.

 어떤 문제나 서비스/상품을 기획해서 에스컬레이션 올리면 의사결정권자는 근거를 원한다. 그 근거는 수치로 말하는 것이 확실하다. Shut up and use the math. 이런 통계를 내려고 MySQL 깔아서 데이터 입력해놓고 쿼리문으로 조지던 개발자들 많을거다.

 뭐 여튼, 빅데이터 기술 진화는 사실 이렇게 필연적이었다고 말할 수 있겠다. 이게 뭐 꼭 오늘날 직면하게된 문제는 아니고 5년 전부터 그 증상들이 이곳저곳에서 나타나고 있었지. 양키들이 NoSQL만들때 우리는 무얼했나? 뭐든 빨리빨리 아웃풋 내놓으라고 쪼아…

Terminate AWS instances with Java SDK

BasicAWSCredentials awsCredentials = new BasicAWSCredentials( AWSAccessKeyId, SecretAccessKey); AmazonEC2Client ec2Client = new AmazonEC2Client(awsCredentials); ec2Client.setEndpoint(""); // Zone List<String> instancesToTerminate = new ArrayList<String>(); DescribeInstancesResult result = ec2Client.describeInstances(); List<Reservation> reservations = result.getReservations(); for (Reservation reservation : reservations) { List<Instance> instances = reservation.getInstances(); for (Instance instance : instances) { System.out.println("Terminating: " + instance.getInstanceId()); instancesToTerminate.add(instance.getInstanceId()); } } TerminateInstancesRequest term = new TerminateInstancesRequest(); term.setInstanceIds(instancesToTerminate); ec2Client.terminateInstances(term);