Hadoop Overview - Doug Cutting and Eric Baldeschwieler
Doug Cutting - pretty much the father of Hadoop gave an overview of Hadoop history. Interesting comment was that Hadoop has achieved web-scale in early 2008...
Eric14 (Eric B...): Grid computing at Yahoo. 500M unique users per month, billions of interesting events per day. 'Data analysis is the inner loop" at Yahoo.
Y's vision and focus: On-demand shared access to vast pool of resources, support for massively parallel execution. , Data Intensive Super Computer (DISC), centrally provisioned and managed, service oriented...Y's focus is not grid computing in in terms of Globus, etc., not focused on external usage ala Amazon EC2/S3. Biggest grid is about 2,000 nodes.
Open Source Stack: Commitment to open source developent, Yahoo is an Apache Platinum Sponsor
Tools used to implement Yahoo's grid vision: Hadoop, Pig, Zookeeper (high avail directory and config sevices), Simon (cluster and app monitoring).
Simon: Very early days, internal to Yahoo right now...similar to Ganglia "but more configurable". Highly configurable aggregation system - gathering data from various nodes to produce (customizable?) reports.
HOG - Hadoop On Demand. Current Hadoop scheduler currently FIFO - jobs will run in parallel to the extent that the previous job doesn't saturate the node. HOG is built on Torque (www.clusterresources.com) to build virtual clusters, separate file systems, etc. Yahoo has taken development about as far as they want...cleaning up code, etc. Future direction for Yahoo is to invest more heavily in the Hadoop scheduler. Does HOG disrupt data locality - yup, it does. Good news: Future Hadoop work will improve rack locality handling significantly.
Hadoop, HOG, Pig all part of Apache today,
Multiple grids inside Yahoo: tens of thousands of nodes, hundreds of thousands of cores, TBs of memory, PBs of disk...ingests TBs of data daily.
M45 Project: Open Academic Cluster in collaboration with CMU: 500 nodes, 3TB RAM, 1.5P disk, high bandwith located conveniently in a semi-truck trailer
Open source project and Apache: Goal is for Hadoop to remain a viable open source project. Yahoo has invested heavily...very excited to see additional contributors and commiters. "Yahoo is very proud of what we've done with Hadoop over the past few years." Interesting metric: Megawatts of Hadoop
Hadoop Interest Growing: 400 people expressed interest in today's conference, 28 organizations registered their Hadoop usage/cluster, in use in universities on multple continents, Y is now started hiring employees with Hadoop experience.
GET INVOLVED: Fix a bug, submit a test case, write some docs, help out!
Random notes: More than 50% conf attendees running Hadoop, many with grids more than 20 nodes, and several with grids > 100 nodes.
Yahoo just announced collaboration with Computational Research Labs (CRL) in India to "jointly support cloud computing research"...CRL runs EKA - the 4th fastest supercomputer on the planet.
음성 인공지능 분야에서 스타트업이 생각해볼 수 있는 전략은 아마 다음과 같이 3가지 정도가 있을 것이다: 독자적 Vertical 음성 인공지능 Application 구축 기 음성 플랫폼을 활용한 B2B2C 형태의 비지니스 구축 기 음성 플랫폼...
점심 먹고 서점에 들러서 집어든 책이 하나 있다. "인간 본성의 법칙", 몇 장 읽어보면서 바로 몰입되는 책. 아직 다 읽지 못해서 리뷰 하긴 뭐 하지만, 과거 내 행동들에 대한 부끄러움, 감정을 배제한 이성적 의사결정에 큰 깨달음을...
The Red : 4천만 MAU를 지탱하는 서비스 설계와 데이터 처리 기술 By 카카오페이지 기술전략이사 윤 당신의 인터넷 서비스는 안정적이고 확장성 있게 설계되어 있습니까? 아파치 하마 프로젝트 의장 활동, GSoC 멘토링 경험과 7개의 기업을 거치...
\(L\)을 label의 집합, \(L’\)을 공백이 포함된 label의 집합이라고 하자. 길이 T를 갖는 sequence에 대하여, 모든 가능한 paths(network output)의 집합을 \(L’^T = \pi\) 라 하자. 실제 label...