Hadoop Overview - Doug Cutting and Eric Baldeschwieler
Doug Cutting - pretty much the father of Hadoop gave an overview of Hadoop history. Interesting comment was that Hadoop has achieved web-scale in early 2008...
Eric14 (Eric B...): Grid computing at Yahoo. 500M unique users per month, billions of interesting events per day. 'Data analysis is the inner loop" at Yahoo.
Y's vision and focus: On-demand shared access to vast pool of resources, support for massively parallel execution. , Data Intensive Super Computer (DISC), centrally provisioned and managed, service oriented...Y's focus is not grid computing in in terms of Globus, etc., not focused on external usage ala Amazon EC2/S3. Biggest grid is about 2,000 nodes.
Open Source Stack: Commitment to open source developent, Yahoo is an Apache Platinum Sponsor
Tools used to implement Yahoo's grid vision: Hadoop, Pig, Zookeeper (high avail directory and config sevices), Simon (cluster and app monitoring).
Simon: Very early days, internal to Yahoo right now...similar to Ganglia "but more configurable". Highly configurable aggregation system - gathering data from various nodes to produce (customizable?) reports.
HOG - Hadoop On Demand. Current Hadoop scheduler currently FIFO - jobs will run in parallel to the extent that the previous job doesn't saturate the node. HOG is built on Torque (www.clusterresources.com) to build virtual clusters, separate file systems, etc. Yahoo has taken development about as far as they want...cleaning up code, etc. Future direction for Yahoo is to invest more heavily in the Hadoop scheduler. Does HOG disrupt data locality - yup, it does. Good news: Future Hadoop work will improve rack locality handling significantly.
Hadoop, HOG, Pig all part of Apache today,
Multiple grids inside Yahoo: tens of thousands of nodes, hundreds of thousands of cores, TBs of memory, PBs of disk...ingests TBs of data daily.
M45 Project: Open Academic Cluster in collaboration with CMU: 500 nodes, 3TB RAM, 1.5P disk, high bandwith located conveniently in a semi-truck trailer
Open source project and Apache: Goal is for Hadoop to remain a viable open source project. Yahoo has invested heavily...very excited to see additional contributors and commiters. "Yahoo is very proud of what we've done with Hadoop over the past few years." Interesting metric: Megawatts of Hadoop
Hadoop Interest Growing: 400 people expressed interest in today's conference, 28 organizations registered their Hadoop usage/cluster, in use in universities on multple continents, Y is now started hiring employees with Hadoop experience.
GET INVOLVED: Fix a bug, submit a test case, write some docs, help out!
Random notes: More than 50% conf attendees running Hadoop, many with grids more than 20 nodes, and several with grids > 100 nodes.
Yahoo just announced collaboration with Computational Research Labs (CRL) in India to "jointly support cloud computing research"...CRL runs EKA - the 4th fastest supercomputer on the planet.
Subscribe to:
Post Comments (Atom)
-
음성 인공지능 분야에서 스타트업이 생각해볼 수 있는 전략은 아마 다음과 같이 3가지 정도가 있을 것이다: 독자적 Vertical 음성 인공지능 Application 구축 기 음성 플랫폼을 활용한 B2B2C 형태의 비지니스 구축 기 음성 플랫폼...
-
네이버, KT, 오라클, 그리고 잠깐의 사업을 거쳐 삼성전자에 입사한지도 2년이 지났습니다. 2016년 병신년을 뒤로하며 이번에는 꽤 색다른 도전에 나섭니다. 무슨 일이야!? 국내 O2O 숙박전문 회사 CTO로 조인합니다! 존! 나 고...
-
우리는 남들의 비판을 경험하면서 창조적 사고를 포기하게 된다. 비판으로부터 방어논리와 자기 검열에 취중한 나머지 더 이상 사고에 자유롭지 못하게 되니까 그렇다. 남들의 비판을 두려워하지 않는 자세.. 그것이 순수한 창조적 사고를 지속하는 방법이다...
-
“군자는 어울리되 패거리를 짓지 않고, 소인은 패거리를 짓되 어울리지 않는다." 군자는 의(義)를 높이기에 아부하지 않고, 부화뇌동(附和雷同)하지 않는다. 군자는 대의명분을 지키면서 화합하며 협력한다. 하지만 소인은 이익을 높이기에 이해관...
Nice Post!! Very Informative blog. keep sharing blog like this. it will defiantly help me to increase my knowledge.
ReplyDeleteVisit:
100% Job Guarantee Courses in Pune
100% Job Guarantee Courses In Delhi
100% Job Guarantee Courses In Mumbai