Breadth-First Search (BFS) & MapReduce

The Breadth-First Search (BFS) & MapReduce was roughly introduced from Distributed Computing Seminar. The graph is stored as a sparse matrix, finds shortest path using Map/Reduce as describe below:

Finding the Shortest Path: Intuition

- We can define the solution to this problem inductively:
- DistanceTo(startNode) = 0
- For all nodes n directly reachable from startNode, DistanceTo(n) = 1
- For all nodes n reachable from some other set of nodes S,
- DistanceTo(n) = 1 + min(DistanceTo(m), m ∈ S)


From Intuition to Algorithm

- A map task receives a node n as a key, and (D, points-to) as its value
- D is the distance to the node from the start
- points-to is a list of nodes reachable from n
∀p ∈ points-to, emit (p, D+1)
- Reduce task gathers possible distances to a given p
and selects the minimum one

According to above-mentioned idea, A map task receives a node n as a key, and (D, points-to) as its value. It means that an "input" is a set of all reachable path from 'start' to each key.

Updated: My mis-understand. It doesn't need an set of all reachable path. See page 9 - "Adjacency Matrices".

- Another classic graph representation.
M[i][j]= '1' implies a link from node i to j.
- Naturally encapsulates iteration over nodes

| 1 2 3 4
--+----------
1 | 0 1 0 1
2 | 1 0 1 1
3 | 0 1 0 0
4 | 1 0 1 0

So, It can be represented as below:

1 (2, 4)
2 (1, 3, 4)
3 (2)
4 (1, 3)

Then, MR job will iterate 'hop' times(= maximum distance). Let's assume the start node is "1".

First MR job:

1 D=0, points-to=(2, 4)
2 D=inf, points-to=(1, 3, 4)
3 D=inf, points-to=(2)
4 D=inf, points-to=(1, 3)

Result: (2,1), (4,1)

Second MR job:

1 D=0, points-to=(2, 4)
2 D=1, points-to=(1, 3, 4)
3 D=inf, points-to=(2)
4 D=1, points-to=(1, 3)

Result: (2,1), (3,2), (4,1)

Here's my test code:

public class BFS {
static Map<String, ArrayList<String>> collect = new HashMap<String, ArrayList<String>>();

public static void main(String[] args) {
String[] value = {
// key | distance | points-to
"1|0|2;4",
"2|"+Integer.MAX_VALUE+"|1;3;4",
"3|"+Integer.MAX_VALUE+"|2",
"4|"+Integer.MAX_VALUE+"|1;3",
};

mapper(value);
reducer(collect.entrySet());
}

private static void mapper(String[] value) {
for (int i = 0; i < value.length; i++) {
String line = value[i].toString();
String[] keyVal = line.split("[|]");

String Key = keyVal[0];
String sDist = keyVal[1];
String[] links = null;
if (keyVal.length > 2) {
links = keyVal[2].split(";");
int Dist = Integer.parseInt(sDist);

if (Dist != Integer.MAX_VALUE)
Dist++;

for (int x = 0; x < links.length; x++) {
if (links[x] != "") {
ArrayList<String> list;
if (collect.containsKey(links[x])) {
list = collect.get(links[x]);
} else {
list = new ArrayList<String>();
}

list.add(Dist + "|");
collect.put(links[x], list);
}
}

ArrayList<String> list;
if (collect.containsKey(Key)) {
list = collect.get(Key);
} else {
list = new ArrayList<String>();
}
list.add(sDist + "|" + keyVal[2]);
collect.put(Key, list);
}
}
}

private static void reducer(Set<Entry<String, ArrayList<String>>> entrySet) {
for (Map.Entry<String, ArrayList<String>> e : entrySet) {
Iterator<String> values = e.getValue().iterator();
int minDist = Integer.MAX_VALUE;
String link_list = "";

while (values.hasNext()) {
String[] dist_links = values.next().toString().split("[|]");
if (dist_links.length > 1)
link_list = dist_links[1];

int dist = Integer.parseInt(dist_links[0]);
minDist = Math.min(minDist, dist);
}

System.out.println(e.getKey() + " - D " + (minDist + " | " + link_list));
}
}
}

Chinese Proverb - 兎死狗烹

Do you know '兎死狗烹'? It's a chinese proverb. In metaphrase, The man use hound to hunt rabbit but shoo the hound out onto the street after hunt rabbit. it means that when needed, it's grateful, but once unneedful, it's useless and abandoned.

Sexy or Intelligent Internet

It is better to be intelligent or sexy? For scientist is important to be intelligent. For model is important sexy appereance. For school professor is important to be intelligent. It is better to be intelligent or sexy to achieve the succes in relations to other people.

Then, the web application should be intelligent or sexy? IMO, It's same with above. A search engine should be intelligent enough to bring you the best information based on what you mean. A entertainment applications should be sexy, fresh, suggestive to provide more fun.

We can find that examples of these character from google.com and youtube.com.

What are the operating system most dominantly use in korea?

I noticed that someone visit my blog with this question - "what are the operating system most dominantly use in korea?". So I would like to answer that question.

In korea, Microsoft Windows (and IE browser) is still used on as many as 90 percent of China's 40 million PCs, because of most korea web-site (e.g. internet banking, game launch sites, multi-media site, ..., etc) requires Active-X on the high speed internet infra-. It was boom w/o special reason.

So, I think that not IE stuff (e.g., IPhone, Google Chrome, Safari, FireFox) didn't much appeal to korea market.

Google Chubby And Distributed Systems

Chubby is a sort of external lock-management server for reliability and availablity, and used to solve asynchronous consensus and other problems in distributed computing.

Building Chubby was an engineering effort required
to fill the needs mentioned above; it was not research.
We claim no new algorithms or techniques. The purpose
of this paper is to describe what we did and why, rather
than to advocate it. In the sections that follow, we de-
scribe Chubby’s design and implementation, and how it
has changed in the light of experience. We describe un-
expected ways in which Chubby has been used, and fea-
tures that proved to be mistakes. We omit details that are
covered elsewhere in the literature, such as the details of
a consensus protocol or an RPC system.

Papers are focused on their overall system architecture as they claimed. Click below to see more detail:

- The Chubby lock service for loosely-coupled distributed systems
- Paxos made live

Updated:
- There is a similar open source called zookeeper.

Google PowerMeter

Google PowerMeter, now in prototype, will receive information from utility smart meters and energy management devices and provide anyone who signs up access to her home electricity consumption right on her iGoogle homepage. The graph below shows how someone could use this information to figure out how much energy is used by different household activites.




Great, It represents a kind of ubiquitous computing system.

Naver storage hosting service, called 'N drive'

Naver internally work for online file sharing and storage service, called 'N drive' which is store large amounts of user files using OwFS (Ownership-based File System). I don't know exactly when it will open, But it's a clearly shows that NHN also started to deal with cloud-related problems.

FT: IBM Creates a Cloud Computing Division

by Timothy Prickett Morgan

You know that Big Blue is getting serious about something when it creates a formal division to manage it. Last week, IBM announced that it was creating a cloud computing division just as it had also announced that key systems software would soon be available for deployment on Amazon's Elastic Compute Cloud (EC2).

Technically speaking, Erich Clementi, who is currently IBM's vice president for strategy and who was formerly a general manager of the Business Systems division (which peddles gear to small and medium businesses) and the System z mainframe business, is now also general manager of Enterprise Initiatives, which is where the company is currently parking all of its cloud computing efforts. And instead of reporting up through Systems and Technology Group or Software Group or Global Services, like the other IBM groups do, this Enterprise Initiatives division reports directly up to IBM president, chief executive officer, and chairman, Sam Palmisano.

According to a report in the eWeek, Kristof Kloeckner has been appointed as chief technology officer of the cross-group, cross-divisional cloud computing division set up last week. Kloeckner was previous vice president of strategy and technology at Software Group.

IBM has launched 13 Blue Cloud centers around the world, which lets ISVs and customers monkey around with cloud technology, such as the open source Hadoop programming tools that are variants of technology deployed by Google, Yahoo, and others. IBM's Blue Clouds are based on its System x and BladeCenter machines and use X64 processors and sport the open source Xen hypervisor to virtualize server slices on the clouds. These centers are more about helping people deploy their own internal cloud infrastructure.

But IBM knows people are going to want to use public clouds for some workloads because of the relatively cheap economics, especially the Amazon EC2 cloud and its related S3 and EBS storage clouds. (S3 is for media files, while EBS is for block-style serving like that used on host computers.) I say especially because Amazon is breaking out in front of the relatively small cloud computing pack, supplying the broadest set of operating systems and middleware for users to deploy applications upon.

Last week, IBM said that it would be deploying a set of its systems and middleware software on the EC2 cloud, all running atop slices of Novell's SUSE Linux Enterprise Server 10. Specifically, IBM is letting application developers who want to mess around with its DB2 and Informix Dynamic Server databases, its WebSphere Portal and WebSphere sMash mashup tools, and its Lotus Web Content Management software use these programs for free on EC2 slices. IBM said that in a few months it will roll out beta support for production grade instances of this software running on EC2, and presumably after some testing and the delivery of a set of Tivoli provisioning tools that can manage EC2 images, it will start selling this software. IBM plans to use the same Processor Value Unit (PVU) software pricing scheme for the software deployed on EC2 images as it does for software when it is deployed utility-style on servers inside customer data centers. You can see the PVU ratings for various EC2 slices here, and compare that to PVU ratings for at this link.


RELATED STORIES

IBM lobs biz software at Amazon cloud (The Register)
Deutsche Telekom births cloud broker (The Register)
The X Factor: Head in the Clouds
IBM sends Blue Clouds back to school (The Register)
Programmers take to the clouds (The Register)
Ruby, COBOL jump on Amazon cloud (The Register)
IBM Offers Clear Solutions for Different Cloud Types
Microsoft taps Dell to build Azure cloud (The Register)
Clouds float Dell bottom line (The Register)
HP Sets Up Cloud-HPC Computing Unit, Launches Two Server Blades
The Blue Cloud Is IBM's Commercial Cloud Computing
Google, IBM Partner on Utility Computing Cloud
Ballmer Talks Up 'Cloud Computing'

Cloud Computing?

Larry Ellison said that the computer industry is more fashion-driven than women's fashion and cloud computing is simply the latest fashion. And Richard Stallman, founder of the Free Software Foundation and creator of the computer operating system GNU, said that cloud computing was simply a trap aimed at forcing more people to buy into locked, proprietary systems that would cost them more and more over time.

Then, What do I think cloud computing is?
I agree with stallman in part that it's a really dangerous technology, But I think cloud computing is the realization of the earlier ideals of utility computing without the technical complexities or complicated deployment worries. Cloud competition has already started between the most global principal IT companies (e.g. google, y!, IBM, amazon, HP, facebook, ..., etc), netizens are already acclimatized on the cloud internet services. (at least in the korea)

Meanwhile back at the NHN, corp., they also took a conservative action (e.g. freezing pay of all staff, reduce employment, ..., etc) by recession. However IMO, they may shouldn't reduce research and development spending about creating and managing a high-performance, Internet-scale Web service if will attempting to provide more services for stepping to global markets. I'm afraid the chosen architectures aren't scalable, and code tweaking in script-level will not help much.

Naver's takeover of Wingbus

The RealTravel of south korea, Wingbus which aggregates tour stories and photos of bloggers from Naver, Daum, ..., etc. has been take over by NHN, corp.

The venture competition seems a story from the old days.