Hadoop Overview - Doug Cutting and Eric Baldeschwieler
Doug Cutting - pretty much the father of Hadoop gave an overview of Hadoop history. Interesting comment was that Hadoop has achieved web-scale in early 2008...
Eric14 (Eric B...): Grid computing at Yahoo. 500M unique users per month, billions of interesting events per day. 'Data analysis is the inner loop" at Yahoo.
Y's vision and focus: On-demand shared access to vast pool of resources, support for massively parallel execution. , Data Intensive Super Computer (DISC), centrally provisioned and managed, service oriented...Y's focus is not grid computing in in terms of Globus, etc., not focused on external usage ala Amazon EC2/S3. Biggest grid is about 2,000 nodes.
Open Source Stack: Commitment to open source developent, Yahoo is an Apache Platinum Sponsor
Tools used to implement Yahoo's grid vision: Hadoop, Pig, Zookeeper (high avail directory and config sevices), Simon (cluster and app monitoring).
Simon: Very early days, internal to Yahoo right now...similar to Ganglia "but more configurable". Highly configurable aggregation system - gathering data from various nodes to produce (customizable?) reports.
HOG - Hadoop On Demand. Current Hadoop scheduler currently FIFO - jobs will run in parallel to the extent that the previous job doesn't saturate the node. HOG is built on Torque (www.clusterresources.com) to build virtual clusters, separate file systems, etc. Yahoo has taken development about as far as they want...cleaning up code, etc. Future direction for Yahoo is to invest more heavily in the Hadoop scheduler. Does HOG disrupt data locality - yup, it does. Good news: Future Hadoop work will improve rack locality handling significantly.
Hadoop, HOG, Pig all part of Apache today,
Multiple grids inside Yahoo: tens of thousands of nodes, hundreds of thousands of cores, TBs of memory, PBs of disk...ingests TBs of data daily.
M45 Project: Open Academic Cluster in collaboration with CMU: 500 nodes, 3TB RAM, 1.5P disk, high bandwith located conveniently in a semi-truck trailer
Open source project and Apache: Goal is for Hadoop to remain a viable open source project. Yahoo has invested heavily...very excited to see additional contributors and commiters. "Yahoo is very proud of what we've done with Hadoop over the past few years." Interesting metric: Megawatts of Hadoop
Hadoop Interest Growing: 400 people expressed interest in today's conference, 28 organizations registered their Hadoop usage/cluster, in use in universities on multple continents, Y is now started hiring employees with Hadoop experience.
GET INVOLVED: Fix a bug, submit a test case, write some docs, help out!
Random notes: More than 50% conf attendees running Hadoop, many with grids more than 20 nodes, and several with grids > 100 nodes.
Yahoo just announced collaboration with Computational Research Labs (CRL) in India to "jointly support cloud computing research"...CRL runs EKA - the 4th fastest supercomputer on the planet.
Subscribe to:
Post Comments (Atom)
-
음성 인공지능 분야에서 스타트업이 생각해볼 수 있는 전략은 아마 다음과 같이 3가지 정도가 있을 것이다: 독자적 Vertical 음성 인공지능 Application 구축 기 음성 플랫폼을 활용한 B2B2C 형태의 비지니스 구축 기 음성 플랫폼...
-
개발자 컨퍼런스같은 것도 방문한게 언제인지 까마득합니다. 코로나로 왠지 교류가 많이 없어졌습니다. 패스트캠퍼스로부터 좋은 기회를 얻어 강연을 하나 오픈하였습니다. 제가 강연에서 주로 다룰 내용은, 인터넷 역사 이래 발전해온 서버 사이드 기술들에 대해 ...
-
패밀리 세단으로 새차 구입은 좀 무리일 것 같아서, 중고로 하나 얻어왔습니다. 중고차라고 티 내는건지 :-) 시거잭에 전원이 안들어오더군요. 요즘 참 세상 좋아졌다고 생각드는게, 유튜브에서 시거잭 전원 불량에 대한 단서를 얻었습니다. 바로 퓨즈가 나가...
-
무한 집합의 크기 Cardinality , 즉 원소의 개수를 수학에서는 '농도'라고 말한다. 유한 집합의 크기는 그대로 원소의 개수 이지만, 무한 집합의 경우는 원소의 개수를 낱낱이 셈하는 것은 불가능하기 때문에 '농도'라...
Nice Post!! Very Informative blog. keep sharing blog like this. it will defiantly help me to increase my knowledge.
ReplyDeleteVisit:
100% Job Guarantee Courses in Pune
100% Job Guarantee Courses In Delhi
100% Job Guarantee Courses In Mumbai