Hama Edward J Yoon October 11 2011 ltedwardyoonapacheorggt About Me Founder of Apache Hama Committer of Apache Bigtop Employee for KT httptwittercom eddieyoon What Is Hama ID: 638571
Download Presentation The PPT/PDF document "Introduction of Apache" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Introduction of Apache Hama
Edward J. Yoon, October 11, 2011
<edwardyoon@apache.org>Slide2
About MeFounder of Apache Hama.
Committer of Apache
Bigtop
.
Employee for KT.
http://twitter.com/
eddieyoonSlide3
What Is Hama?Apache Incubator Project.
BSP
(Bulk Synchronous Parallel
) for massive scientific computations.
Written In Java.
Currently 2 releases, 3 main committers.Slide4
Hama CharacteristicsProvides a Pure BSP model .
Job submission and management interface.
Multiple tasks per node.
Checkpoint recovery.
Supports to run in the Clouds using Apache Whirr.
Supports to run with Hadoop
nextGen
.Slide5
Bulk Synchronous Parallel?
Parallel programming model introduced by Valiant.
Consist of a sequence of supersteps.
Conceptually simple and intuitive from a programming standpoint.
Used for a variety of applications e.g., scientific computing, genetic programming, …Slide6
Schematic diagram of a superstep
Local Computation
Idle
Idle
Communication
……….
……….
Barrier
SynchronizationSlide7
InternalsHadoop RPC is used for BSP
tasks to
communicate each other.
Collection and bundling
of messages as a technique to
reduce
network overheads and contentions.
Zookeeper is used for Barrier Synchronization.Slide8
Pi Calculation
Each task executes locally its portion of the loop a number of times.
One task acts as master and collects the results through the BSP communication interface.Slide9
Structural Analysis of Network Traffic Flows
Traffic
flows in KT clouds.
traffic engineering, anomaly detection, traffic forecasting and capacity planning
Currently
BSP
jobs are
experimentally running
on 512 multi-cores machines.Slide10
Random Communication BenchmarksBenchmarked
on 16
1U servers using
10 tasks per
server.
X
axis is the
time (sec.)of BSP job execution (32 supersteps).
Y
axis is the number of
messages to be sent to random BSP tasks in each superstep.Slide11
What’s Next?Support
Input/Output
Formatter
like
MapReduce.
Message Compression for High
Performance.
Add some frameworks on top
of Hama.Slide12
More Informationhttp://incubator.apache.org/hama
http://wiki.apache.org/hama