Hadoop Overview - Doug Cutting and Eric Baldeschwieler
Doug Cutting - pretty much the father of Hadoop gave an overview of Hadoop history. Interesting comment was that Hadoop has achieved web-scale in early 2008...
Eric14 (Eric B...): Grid computing at Yahoo. 500M unique users per month, billions of interesting events per day. 'Data analysis is the inner loop" at Yahoo.
Y's vision and focus: On-demand shared access to vast pool of resources, support for massively parallel execution. , Data Intensive Super Computer (DISC), centrally provisioned and managed, service oriented...Y's focus is not grid computing in in terms of Globus, etc., not focused on external usage ala Amazon EC2/S3. Biggest grid is about 2,000 nodes.
Open Source Stack: Commitment to open source developent, Yahoo is an Apache Platinum Sponsor
Tools used to implement Yahoo's grid vision: Hadoop, Pig, Zookeeper (high avail directory and config sevices), Simon (cluster and app monitoring).
Simon: Very early days, internal to Yahoo right now...similar to Ganglia "but more configurable". Highly configurable aggregation system - gathering data from various nodes to produce (customizable?) reports.
HOG - Hadoop On Demand. Current Hadoop scheduler currently FIFO - jobs will run in parallel to the extent that the previous job doesn't saturate the node. HOG is built on Torque (www.clusterresources.com) to build virtual clusters, separate file systems, etc. Yahoo has taken development about as far as they want...cleaning up code, etc. Future direction for Yahoo is to invest more heavily in the Hadoop scheduler. Does HOG disrupt data locality - yup, it does. Good news: Future Hadoop work will improve rack locality handling significantly.
Hadoop, HOG, Pig all part of Apache today,
Multiple grids inside Yahoo: tens of thousands of nodes, hundreds of thousands of cores, TBs of memory, PBs of disk...ingests TBs of data daily.
M45 Project: Open Academic Cluster in collaboration with CMU: 500 nodes, 3TB RAM, 1.5P disk, high bandwith located conveniently in a semi-truck trailer
Open source project and Apache: Goal is for Hadoop to remain a viable open source project. Yahoo has invested heavily...very excited to see additional contributors and commiters. "Yahoo is very proud of what we've done with Hadoop over the past few years." Interesting metric: Megawatts of Hadoop
Hadoop Interest Growing: 400 people expressed interest in today's conference, 28 organizations registered their Hadoop usage/cluster, in use in universities on multple continents, Y is now started hiring employees with Hadoop experience.
GET INVOLVED: Fix a bug, submit a test case, write some docs, help out!
Random notes: More than 50% conf attendees running Hadoop, many with grids more than 20 nodes, and several with grids > 100 nodes.
Yahoo just announced collaboration with Computational Research Labs (CRL) in India to "jointly support cloud computing research"...CRL runs EKA - the 4th fastest supercomputer on the planet.
A New Parallel Algorithm For Evaluating The Determinant of A Matrix of Order n
Here's the link of PPT file.
http://www.math.tu-berlin.de/EuroComb05/Talks/Poster/p12-Teimoori_Faal.ppt
I think one of these partitioning ideas can be used for hama.
http://www.math.tu-berlin.de/EuroComb05/Talks/Poster/p12-Teimoori_Faal.ppt
I think one of these partitioning ideas can be used for hama.
Subscribe to:
Posts (Atom)
-
음성 인공지능 분야에서 스타트업이 생각해볼 수 있는 전략은 아마 다음과 같이 3가지 정도가 있을 것이다: 독자적 Vertical 음성 인공지능 Application 구축 기 음성 플랫폼을 활용한 B2B2C 형태의 비지니스 구축 기 음성 플랫폼...
-
The Breadth-First Search (BFS) & MapReduce was roughly introduced from Distributed Computing Seminar . The graph is stored as a sparse m...
-
패밀리 세단으로 새차 구입은 좀 무리일 것 같아서, 중고로 하나 얻어왔습니다. 중고차라고 티 내는건지 :-) 시거잭에 전원이 안들어오더군요. 요즘 참 세상 좋아졌다고 생각드는게, 유튜브에서 시거잭 전원 불량에 대한 단서를 얻었습니다. 바로 퓨즈가 나가...