## December 18, 2007

### Probability distribution

A calculator manufacturer checks for defective products by testing 3 calculators out of every lot of 12. If a defective calculator is found the lot is rejected.

a) Suppose 2 calculators in a lot are defective. Outline two ways of calculating the probability that the lot will be rejected. Calculate this probability

Do I use the probability distribution here? This question is a hypergeometric distribution

b) The quantity control department wants to have at least 30% chance of reject lots that contains only one defective calculator. Is testing 3 calculators in a lot of 12 sufficient? If not how would you suggest they alter their quality control techniques to achieve this standard? Support your answer with mathematical calculations

a = 2 b = 10 n = 12 r = 3

## December 11, 2007

### nohup command

$nohup sleep 400 & <== command [1] 7026$ Sending output to nohup.out

ctrl + d
$There are running jobs. ctrl + d$ ps -ef| grep user6
user6 7026 1 0 15:52:03 ? 0:00 sleep 400 <== Running!!
user6 7049 7029 14 15:53:00 pts/tg 0:00 ps -ef
user6 6769 6768 0 14:12:12 pts/tc 0:00 -sh
user6 7050 7029 3 15:53:00 pts/tg 0:00 grep user6
user6 7029 7028 0 15:52:38 pts/tg 0:00 -sh
## December 2, 2007 ### Oh.... Proposals. Good. I have received many proposal from the most famous global companies. ... But, i have no money for airplane tickets. XD ## November 28, 2007 ### It’s bitter cold. It’s bitter cold. November is my coldest month. Sometimes it snows heavily in my mind. Because, Christmas is drawing near. (Do you understand what I’m saying?) ## November 26, 2007 ### Meeting with Hbase Company, Powerset http://www.barneypell.com/archives/2007/11/my_first_trip_t.html While there was not much time for fun, my hosts at NHN (Paul Sung and Ed Yoon) picked me up at the airport and took me out to a meal at a traditional Korean restaurant...... I had a good time. :) ## November 25, 2007 ### Introducing Android ## November 24, 2007 ### FT: South Korea Approves Samsung Probe SEOUL, South Korea (AP) — South Korea's National Assembly passed a bill Friday demanding an independent investigation into allegations of bribery at the Samsung Group conglomerate. The bill was to go to President Roh Moo-hyun for final approval. His office has said he may veto it because state prosecutors have already launched a probe into the scandal at the country's largest industrial group, which includes Samsung Electronics Co. The single-chamber legislature, however, can override a veto if a majority of its 299 members attend a floor vote and two-thirds of them vote in favor. A total of 155 lawmakers voted for the bill Friday, 17 cast ballots against it and 17 abstained. A total of 110 lawmakers were absent and did not vote. The legislation calls for Roh to name an independent counsel to delve into allegations against Samsung, including that it operated slush funds to bribe influential figures such as prosecutors, judges and government officials. Other accusations include claims Samsung manipulated evidence and witnesses in a court case over a purported deal that critics say was aimed at transferring corporate control of Samsung from the group's chairman, Lee Kun-hee, to his only son. The lawmakers have cast doubt on whether state prosecutors could effectively carry out a probe given that some were among those accused of accepting bribes, saying in the bill that a probe by those investigators "cannot earn the people's confidence." The allegations cited by the legislation are based on the claims of a former top Samsung legal affairs official, who this month went public to reveal the alleged wrongdoing. Kim Yong-chul, himself a former prosecutor, said he was responsible for bribing those in the legal field and claimed that Lim Chai-jin — the nation's new top prosecutor — was among those who took payments. Lim has denied the allegation. Two civic groups subsequently filed a criminal lawsuit against Samsung, prompting state prosecutors to open a probe. On Thursday, Samsung, which has vociferously denied the allegations, expressed regret over the lawmakers' impending action, but said it would cooperate with an independent probe. On Friday, the business group said it stood by that comment. The bill's passage — the seventh time a special prosecutor has been approved by the National Assembly — came after lawmakers reached a deal to combine two separate proposals into a single bill. A coalition of liberal lawmakers, many aligned with Roh, agreed to a proposal by conservatives to also investigate their claims that Roh received Samsung money before and after the 2002 election. The legislation does not cite Roh by name but states that those in "the highest political echelon" allegedly received illicit funds from Samsung during and after the 2002 presidential race. "We have already said we can consider the veto rights and that is still effective," Cheon Ho-seon, Roh's spokesman, told reporters. But he added a final decision would be made after receiving the bill. The legislation calls for Roh to appoint an independent counsel out of three candidates recommended by the Korean Bar Association. The special prosecutor, aided by 33 assistant investigators, can investigate for up to 105 days. Huge South Korean industrial groups such as Samsung are not new to scandals. The conglomerates have regularly been accused of wielding influence as well as dubious dealings between subsidiaries to help controlling families evade taxes and transfer wealth to heirs. ## November 17, 2007 ### racial discrimination Distrust of foreigners shading into racism !! I was a victim of their racism. ## November 6, 2007 ### Today, my anger gauge 8/10. My anger gauge 8/10. 8/10 RPM is a red zone. ## November 3, 2007 ### Week in review: Go go Google Google is big and getting bigger. Google's shares traded over the700 mark this week, marking a new first for the Internet giant. Just a little more than three weeks ago, Google shares passed the $600 mark and analysts were speculating its shares could climb as high as$700 within the next year. Apparently, it's been a quick year.

The stock was up following reports that Google is in "serious discussions" with Verizon Wireless to put its mobile "GPhone" software on Verizon phones. For months, people have been speculating about the GPhone.

Most people believe that it's not a specific phone, but is more likely an operating system or software that integrates many of Google's mobile services, such as Web search, Gmail, YouTube, and Google Maps, onto phones made by existing handset makers. But more than simply integrating Google services onto handsets, the new Google mobile operating system is believed to be an open platform on which application developers would have free reign to develop a slew of new applications and services.

But, as CNET News.com's Marguerite Reardon points out, Google-powered phones will be useless unless the company can strike deals with mobile operators to allow them on their networks. T-Mobile USA is rumored to be the first U.S. operator that will sign on with Google.

CNET News.com readers expressed concern that Google's mobile applications would be limited to one or two handsets offered by a single carrier.

"Great! Another new phone designed to screw over American consumers by locking it down to just one cell phone provider," one reader wrote to the News.com TalkBack forum. "Is Google really that insensitive to the market and to consumers?"

In another move that was anticipated for weeks, Google has unveiled a set of application program interfaces (APIs) that allow third-party programmers to build widgets that take advantage of personal data and profile connections on a social-networking site. But instead of limiting the project to its own social-networking property, Orkut, Google has invited other sites along for the ride--including LinkedIn, Hi5, Plaxo, Ning, and Friendster.

Google's version of this "write once, run anywhere" concept is called OpenSocial, a set of common APIs that will enable developers to create applications for social networks, blogs, and any Web sites that accept the OpenSocial code. Currently, developers have to write new programs for each site, even if the functionality will be the same on each site.

This announcement illustrates how Google is courting developers and possibly attempting to outdo Facebook in openness. Facebook opened up its platform to developers in June and the site was immediately flooded with all sorts of useful and not-so-useful apps. Google, Yahoo, and others have been heavily espousing the beauty of open platforms and making moves to that end.

Leopard on the loose
Some 30 months after Apple released Tiger, it released the Leopard operating system into the wild--a little later than originally planned due to the company's work on iPhone. And while it wasn't exactly iPhone Day, several hundred Mac fans lined up for the launch in the pouring rain outside the Apple Store on Fifth Avenue in Manhattan.

The line for Leopard appeared to be divided fairly evenly between rabid Apple fans and shoppers who'd figured they could stop by and pick it up quickly--and indeed, come launch time, the line moved fast as customers were ushered into a gauntlet of Apple Store employees (much like the iPhone launch in June) and directed straight to the cash registers when the doors opened at 6 p.m. (The scene was repeated in San Francisco, where hundreds of people lined up on Stockton Street to get their hands on the new OS.)

However, the installation process didn't always go as smoothly. Apple posted a support document over the weekend on its Web site addressing reports of interminable "blue screen" problems that caused some Mac users upgrading to Mac OS X Leopard no small degree of frustration.

Some attempts to upgrade to Leopard were stymied after the installation process was almost complete and users attempted to restart their machines. A long thread on Apple's discussion forums outlined the problems, in which their Macs would get hung up on the initial boot screen. That screen happens to be blue, inviting comparisons to the infamous Windows "blue screen of death" encountered when Windows crashes.

There are dozens of important new features in Leopard, perhaps most notably the Time Machine application that could make it easier for users to back up and restore their files. Backing up your files is generally a simple exercise with a external hard drive, but Time Machine is interesting because of the friendly way in which it lets you restore files, flying back in time (and space) to the last instance in which that file was saved.

## October 27, 2007

### Life is a game of probability

It was a bitter strife between the two rivals.

## October 19, 2007

### Top 10 Search Properties WorldWide

NHN corporation is 5th.

## October 13, 2007

### Similar Query Languages.

Y!'s Pig : http://research.yahoo.com/project/pig
Microsoft LINQ : http://msdn.microsoft.com/data/ref/linq/

But, i don't like these.

### Y!'s platform for nimble universal table storage

http://research.yahoo.com/node/212

Nuggets :
- No plans to open source.
- The implemented basic relational operators do not allow for ad-hoc analysis and bulk processing. (use pig, hadoop instead)
- They have a SQL-like language but it’s very basic. (no support for joins, aggregation, etc.)
- It has active participation of yahoo infrastructure team.

## October 7, 2007

### Oh, God. Craig Venter

http://scotlandonsunday.scotsman.com/index.cfm?id=1601642007

AN AMERICAN researcher has claimed he is just weeks away from realising a science-fiction dream: the creation of artificial life.

Craig Venter, a controversial and flamboyant DNA scientist, said he is about to produce a synthetic living cell that is capable of reproducing itself.

If Venter delivers on his bold promise it will rank as one of the greatest scientific breakthroughs of recent years. It could open the door to a new generation of artificial life forms designed to tackle everything from disease in humans to environmental crises.

But while the Maryland-based scientist has caused excitement in scientific quarters, he has also prompted a renewed ethical debate on the acceptable limits of research into the building blocks of life. As well as concern over "playing god", some experts fear the creation of a new species could have safety implications.

Chromosomes are at the centre of Venter's breakthrough. In the simplest forms of life, every cell has a chromosome, which is a long string of DNA that "tells" the cell what kind it is, what to do and when. He has used laboratory chemicals to create an artificial chromosome, based on a "stripped-down" version of a bacterium.

The next step involves inserting the artificial chromosome into a natural cell from a bacterium. Venter said the artificial chromosome will take over its host cell, effectively becoming a new artificial form of life. Crucially, it will have the ability to reproduce itself.

Venter believes the technique will work because his team has already successfully transplanted chromosomes from one bacteria cell to another. If the technique works as expected, the next step will be to genetically alter the genetic make-up of the synthetic chromosome to deal with specific real-work tasks. For example, it is theoretically possible to make an artificial life form to consume greenhouse gases.

Venter, a Vietnam veteran and a yachtsman, has provoked controversy in the past because of his flamboyant style and his commercial approach to science. In the 1990s, he turned the human genome project into a competition by effectively racing publicly funded scientists to complete the map of the human gene.

He said: "This will be a very important philosophical step in the history of our species. We are going from reading our genetic code to the ability to write it. That gives us the hypothetical ability to do things never contemplated before."

Venter added he had carried out an ethical review before completing the experiment. He said: "We feel that this is good science. We are not afraid to take on things that are important just because they stimulate thinking. We are dealing in big ideas. We are trying to create a new value system for life. When dealing at this scale, you can't expect everybody to be happy."

Grahame Bulfield, vice-principal of Edinburgh University and professor of genetics, said: "This is a technical tour de force rather than an intellectual breakthrough. But it opens up molecular genetics to a huge range of new possibilities and applications, and should give much more control over how it is done."

James Milner-White, professor of structural bio-informatics at Glasgow University, said: "It's potentially very exciting. I would want to know more about what is happening in the experiments and whether the life forms they create are viable. I note that they haven't mentioned that yet. If the life forms are viable, then it could be very significant."

Dr Mark Bailey, a lecturer in genetics at Glasgow University, said:

"If this work does produce viable bacteria, the next step will be to add genes to them to get them to do what you want them to do. Adding the genes is actually quite straightforward, but getting them to do what you want in the way you want is very challenging. That will take some years of work."

But the news has provoked concern among campaigners who want restraints on the research being pioneered by genetic scientists.

Pat Mooney, director of Canadian bioethics organisation ETC group, said: "Governments, and society in general, are way behind the ball. This is a wake-up call: what does it mean to create new life forms in a test tube?"

He said Venter was creating a "chassis on which you could build almost anything. It could be a contribution to humanity such as new drugs or a huge threat to humanity such as bio-weapons."

## October 6, 2007

### Sun Patches Critical Java Bugs

Sun Microsystems Inc. patched 11 vulnerabilities in the Windows, Linux and Solaris versions of its Java Runtime Environment and Java Web Start Wednesday, including several rated critical by outside researchers.

The fixes to Java Runtime Environment (JRE) 1.3.1, 1.4.2, 5.0 and 6.0 plug holes that attackers could use to bypass security restrictions, manipulate data, disclose sensitive information or compromise an unpatched machine. Among the JRE bugs, Sun said in several security advisories, are two that allow attack code from malicious sites to make network connections on machines other than the victimized computer. One possible result, according to a paper by several Stanford University researchers that was cited by Sun: circumvented firewalls.

Other vulnerabilities in JRE and Java Web Start, a framework that lets Java-based applications launch directly from a browser, could be used by attackers to read local files, overwrite local files and hide Java-generated warnings.

Although Sun does not assign threat scores or label its advisories with terms such as "critical" or "low," Danish bug tracking vendor Secunia collectively tagged the five advisories and their 11 patches as "highly critical," its second-highest ranking.

Some of the vulnerabilities are limited to specific JRE versions, but pulling action items from the advisories is difficult since Sun does not use an easy-to-understand grid as does Microsoft, for instance, to indicate affected software. Neither JRE nor Web Start includes an automatic update mechanism; users must manually download and apply the updated versions Sun has posted on its Web sitehere.

Mention of Mac OS X was, as usual, absent in the security advisories. Sun does not post updated editions of JRE and other Java components for the Mac operating system. Instead, Apple Inc.'s implementation of Java requires that the company provide Java fixes as part of its own security updates. That's been a sticking point with some Mac users, who have expressed concern that Apple has not updated its Java code since February.

## October 4, 2007

### conversazione with Zaheda

subject : Open Source Program and Software

Agenda
6:30~7:00: dinner, reception.
7:00~8:30: conversazione.

Zaheda Bhorat,
Open Source Programs Manager, Google, Inc.

Zaheda Bhorat is Open Source Programs Manager at Google, Inc., working on projects to promote the spread of open source software both inside and outside Google. She has been responsible for programs like the Google Summer of Code, Google-O'Reilly Open Source Awards and is driving Google's support of open standards such as Open Document Format (ODF).

She has more than 15 years of experience in technology and software with expertise in open source software, web 2.0, and community building. Before joining Google, Zaheda was responsible for the open source community at OpenOffice.org while at Sun Microsystems. She built the first open source marketing community with volunteers to support the office application, and the first native language community which now boasts 100 languages. Prior this Zaheda was responsible for the (online) Apple Store and building online communities at Apple Computer Inc. while managing the Apple Online Service Division in Europe.

An internationally-known advocate for open source software, Zaheda speaks regularly to educate on open source topics, open standards, particularly in developing countries. She has an engineering degree and would like to encourage open source principles and methods to spread to areas outside of software.

## September 28, 2007

### Types of JDBC technology drivers

1. A JDBC-ODBC bridge provides JDBC API access via one or more ODBC drivers. Note that some ODBC native code and in many cases native database client code must be loaded on each client machine that uses this type of driver. Hence, this kind of driver is generally most appropriate when automatic installation and downloading of a Java technology application is not important. For information on the JDBC-ODBC bridge driver provided by Sun, see JDBC-ODBC Bridge Driver.

2. A native-API partly Java technology-enabled driver converts JDBC calls into calls on the client API for Oracle, Sybase, Informix, DB2, or other DBMS. Note that, like the bridge driver, this style of driver requires that some binary code be loaded on each client machine.

3. A net-protocol fully Java technology-enabled driver translates JDBC API calls into a DBMS-independent net protocol which is then translated to a DBMS protocol by a server. This net server middleware is able to connect all of its Java technology-based clients to many different databases. The specific protocol used depends on the vendor. In general, this is the most flexible JDBC API alternative. It is likely that all vendors of this solution will provide products suitable for Intranet use. In order for these products to also support Internet access they must handle the additional requirements for security, access through firewalls, etc., that the Web imposes. Several vendors are adding JDBC technology-based drivers to their existing database middleware products.

4. A native-protocol fully Java technology-enabled driver converts JDBC technology calls into the network protocol used by DBMSs directly. This allows a direct call from the client machine to the DBMS server and is a practical solution for Intranet access. Since many of these protocols are proprietary the database vendors themselves will be the primary source for this style of driver. Several database vendors have these in progress.

## September 26, 2007

### Y!'s Pig, Stepping into Apache Incubator.

http://wiki.apache.org/incubator/PigProposal

I think we(Hbase, Hbase Shell) must try harder. -__-a

## September 23, 2007

### Shrek "Songpyeon"

This is a rice-and-mugwort cake.

## September 21, 2007

### Korean traditional holiday "Chuseok" (September, 22 ~ 30)

Chuseok is celebrated on the 15th day of the eighth lunar month as the important traditional holiday. It is a celebration of the harvest and a thanks giving for the bounty of the earth. Most People visit their hometowns to be with family and enjoy a special food “Songpyeon” and wish to realize their desire viewing full moon in the evening.

- I'll go under the water for a while. Bye Bye~

## September 19, 2007

### I got a job promotion!! :D

 Working Stiff Crime Syndicate CEO Godfather President Capo Vice President Boss Partner Regional Boss General Manager Under Boss Manager Made * Assistant Manager * Soldier Team Leader Button Man Corporate Dron Enforcer Peon Dealer Intern Thug

Not only does my job ROCK, but I will! Woo!
Ps. Thanks, joo. My konglish was fixed. :)

The name Altools will no longer be used due to trademark issues.
ALTools PC Utilities - www.altools.net

-_______-a go well.
I think altools was an unfashionable name.

## September 18, 2007

### ISWC + ASWC 2007 (November 11.15) in Korea

Speakers
• Ron Kaplan (CTO, PowerSet)

• Chris Welty (Research Scientist, IBM)

• http://iswc2007.semanticweb.org/program/InvitedSpeakers.asp

i'll register. :)

## September 14, 2007

### Joy of Apache open software development.

Last night, I have received proposal to be interviewed from the very most popular technology company's Lab.

I wanna go global!
But, The Far Country makes me thinkful that it was troublesome for me to invest the time. also, my open source project has just began.

Now, i started falling in love with open source.
Mind conflict. -0-

## September 13, 2007

### Growing number of Hbase Shell Member.

 Hbase Shell Hbase Shell Plans Hbase RDF

## September 12, 2007

### Patriotism

I added some example of patriotism to Hbase Shell wiki page.

 Hbase > SELECT 'studioName:YoungGu Art'         --> FROM movieLog_table         --> WHERE row = 'D-War';

@^_______^@

D-War (also known as Dragon Wars) is a 2007 South Korean film directed by Shim Hyung-rae. It is a fantasy-action film that is reportedly the biggest budgeted South Korean film of all-time. -- wikipedia.

### Monopoly Castle 'Naver'

At least we, the people of the korea, still don’t know well about Google Search Engine, Google's power. When i introduced a 'google.com' to my sister, she said " Is this your new homepage? I think you need a graphic design help. "

But, ... i love google's massive computing engine.
and i wanna know their secrets.

 Related News : Can Google Be Beat? They Already Have Been in South Korea.

## September 11, 2007

### Deductive reasoning joke.

people = eat + sleep + work + play
pig = eat + sleep
people = pig + work + play
people - play = pig + work

Result : people who dunno play = working pig
(pig who know go to work)

## September 10, 2007

$ssh-keygen -t dsa Generating public/private dsa key pair. Enter file in which to save the key (/home/udanax/.ssh/id_dsa): Created directory '/home/udanax/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/udanax/.ssh/id_dsa. Your public key has been saved in /home/udanax/.ssh/id_dsa.pub. The key fingerprint is: blah~ blah~$ _


and, then

### Hbase Shell

Hbase Shell is a basic, command-line, and interactive 'shell' for manipulating tables in Hbase. It has support for a small set of SQL-inspired operations. Results are presented in an ASCII-table format.

The Hbase Shell aims to be to Hbase what the mysql client command-line tool is to mysqld, and what sqlplus to Oracle.

Hbase Shell was first added to TRUNK in July, 2007.

### Google Sky Gives a Close-Up View of the Universe

Armchair explorers will now have the entire universe at their fingertips, thanks to Google's latest venture, Google Sky, a new free feature that's an application in the popular Google Earth program.
Starting today, anyone with a computer can view a close-up of about 100 million galaxies and 200 million stars।
"This is an application that allows you to see the sky at very, very high resolution, as if you were just flying through the universe and seeing and visiting galaxies," said Chikai Ohazama, a Google product manager who has worked to gather data from astronomical organizations around the world.
Google has stitched together real photographs of the universe into one giant database.
"Basically you're seeing imagery that you have to have a very, very high-powered telescope to look at and we're placing that in the database," Ohazama said. "You can zoom in very, very close and see the actual spiral, a galaxy and the clusters around it."
Google already allows users to see Earth at a level of detail many spy agencies would envy. The program's satellite and street-level imagery is so advanced it has generated alarm from privacy advocates.
One of the unique features of Google Sky is that you can plug in your address and the program shows you what the sky above your home looks like.
Google Sky allows users to bookmark constellations, rotate the whole sky and zoom in to see details of black holes and stars.
It is an awe-inspiring look at the universe, not to mention a whole new way to waste time at work.

### FT: Yahoo!'s bet on Hadoop

One of the most important announcements at Oscon last week was Yahoo!'s commitment to support Hadoop. We've been writing about Hadoop on radar for a while, so it's probably not news to you that we think Hadoop is important.
Yahoo's involvement wasn't actually news either, because Yahoo! had hired Doug Cutting, the creator of hadoop, back in January. But Doug's talk at Oscon was kind of a coming out party for Hadoop, and Yahoo! wanted to make clear just how important they think the project is. In fact, I even had a call from David Filo to make sure I knew that the support is coming from the top.
Jeremy Zawodny's post about hadoop on the Yahoo! developer network does a great job of explaining why Yahoo! considers hadoop important:
For the last several years, every company involved in building large web-scale systems has faced some of the same fundamental challenges. While nearly everyone agrees that the "divide-and-conquer using lots of cheap hardware" approach to breaking down large problems is the only way to scale, doing so is not easy.
The underlying infrastructure has always been a challenge. You have to buy, power, install, and manage a lot of servers. Even if you use somebody else's commodity hardware, you still have to develop the software that'll do the divide-and-conquer work to keep them all busy.
It's hard work. And it needs to be commoditized, just like the hardware has been...
To build the necessary software infrastructure, we could have gone off to develop our own technology, treating it as a competitive advantage, and charged ahead. But we've taken a slightly different approach. Realizing that a growing number of companies and organizations are likely to need similar capabilities, we got behind the work of Doug Cutting (creator of the open source Nutch and Lucene projects) and asked him to join Yahoo to help deploy and continue working on the [then new] open source Hadoop project.
Let me unpack the two parts of this news: hadoop as an important open source project, and Yahoo!'s involvement. On the first front, I've been arguing for some time that free and open source developers need to pay more attention to Web 2.0. Web 2.0 software-as-a-service applications built on top of the LAMP stack now generate several orders of magnitude more revenue than any companies seeking to directly monetize open source. And most of the software used by those Web 2.0 companies above the commodity platform layer is proprietary. Not only that, Web 2.0 is siphoning developers and buzz away from open source.
But there are open source projects that are tackling important Web 2.0 problems "up the stack." Brad Fitzpatrick's LiveJournal scaling tools memcached, perlbal, and mogileFS come to mind, as well as OpenID. Hadoop is another critical piece of Web 2.0 infrastructure now being duplicated in open source. (I'm sure there are others, and we'd love to hear from you about them in the comments.)
OK -- but why is Yahoo!'s involvement so important? First, it indicates a kind of competitive tipping point in Web 2.0, where a large company that is a strong #2 in a space (search) realizes that open source is a great competitive weapon against their dominant competitor. It's very much the same reason why IBM got behind Eclipse, as a way of getting competitive advantage against Sun in the Java market. (If you thought they were doing it out of the goodness of their hearts rather than clear-sighted business logic, think again.) If Yahoo! is realizing that open source is an important part of their competitive strategy, you can be sure that other big Web 2.0 companies will follow. In particular, expect support of open source projects that implement software that Google treats as proprietary. (See the long discussion thread on my post about Microsoft's submission of their shared source licenses to OSI for my arguments as to why "being on the right side of history" will ultimately drive Microsoft to open source.)
Supporting Hadoop and other Apache projects not only gets Yahoo! deeply involved in open source software projects they can use, it helps give them renewed "geek cred." And of course, attracting great people is a huge part of success in the computer industry (and for that matter, any other.)
Second, and perhaps equally important, Yahoo! gives hadoop an opportunity to be tested out at scale. Some years ago, I was on the board of Doug's open source search engine effort, Nutch. Where the project foundered was in not having a large enough data set to really prove out the algorithms. Having more than a couple of hundred million pages in the index was too expensive for a non-profit open source project to manage. One of the important truths of Web 2.0 is that it ain't the personal computer era any more, Eben Moglen's arguments to the contrary notwithstanding. A lot of really important software can't even be exercised properly without very large networks of machines, very large data sets, and heavy performance demands. Yahoo! provides all of these. This means that Hadoop will work for the big boys, and not just for toy projects. And as Jeremy pointed out in his post (linked and quoted above), today's big boy may be everyday folks a few years from now, as the size and scale of Web 2.0 applications continue to increase.
BTW, in followup conversations with Doug, he pointed out that web search is not actually the killer app for hadoop, despite the fact that it is in part an implementation of the MapReduce technique made famous by Google. After all, Yahoo! has been doing web search for years without this kind of general purpose scaling platform. "Where Hadoop really shines," says Doug, "is in data exploration." Many problems, including tuning ad systems, personalization, learning what users need -- and for that matter, corporate or government data mining -- involve finding signal in a lot of noise. Doug pointed me to an interesting article on Amazon Web Services Developer Connection: Running Hadoop MapReduce on Amazon EC2 and Amazon S3. Doug said in email:
It provides an example of using Hadoop to mine one's [logfile] data.
Another trivial application for log data that's very valuable is reconstructing and analyzing user sessions. If you've got logs for months or years from hundreds of servers and you want to look at individual user sessions, e.g., how often do users visit, how long are their sessions, how do they move around the site, do often do they re-visit the same places, etc. This is a single MapReduce operation over all the logs, blasthing through, sorting and collating all your logs at the transfer rate of all the drives in your cluster. You don't have to re-structure your database to measure something new. It's really as easy as 'grep sort uniq'.
Also, here are the slides from my talk.