Hadoop YARN Yet Another Resource Negotiator WeiChiu Chuang 10172013 Permission to copydistributeadapt the work except the figures which are copyrighted by ACM Master JobTracker JT ID: 399430
Download Presentation The PPT/PDF document "Apache" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Apache Hadoop YARN: Yet Another Resource Negotiator
Wei-Chiu Chuang10/17/2013
Permission to copy/distribute/adapt the work except the figures which are copyrighted by ACMSlide2
Master: JobTracker (JT)Worker: Tasktracker
(TT)Fixed # of map slots and reduce slots
Hadoop
client
Master Node
(
JobTracker
)
Worker Node
(
TaskTracker)
Worker Node(TaskTracker)
Worker Node(TaskTracker)Slide3
Hadoop is being used for all kinds of tasks beyond its original designTight coupling of a specific programming model with the resource management infrastructure
Centralized handling of jobs’ control flow
The ProblemSlide4
ScalabilityHadoop Design CriteriaSlide5
ScalabilityMulti-tenancyA shared pool of nodes for all jobsAllocate
Hadoop clusters of fixed size on the shared pool.Serviceabilitysets up a new cluster for every job
old and new Hadoop co-existHadoop
has short, 3-month release cycleHadoop on DemandSlide6
ScalabilityMulti-tenancyServiceabilityLocality Awareness
JobTracker tries to place tasks close to the input dataBut node allocator is not aware of the locality
Failure of HoDSlide7
ScalabilityMulti-tenancyServiceabilityLocality Awareness
High Cluster UtilizationHoD does not resize the cluster between stagesUsers allocate more nodes than needed
Competing for resources results in longer latency to start a job
Failure of HoDSlide8
ScalabilityMulti-tenancyServiceabilityLocality Awareness
High Cluster UtilizationReliability/AvailabilityThe failure in one job tracker can bring down the entire clusterOverhead of tracking multiple jobs in a larger, shared cluster
Problem with Shared ClusterSlide9
ScalabilityMulti-tenancyServiceabilityLocality Awareness
High Cluster UtilizationReliability/AvailabilitySecure and auditable operationAuthentication
Challenge of Multi-tenancySlide10
ScalabilityMulti-tenancyServiceabilityLocality Awareness
High Cluster UtilizationReliability/AvailabilitySecure and auditable operationSupport for Programming
Model DiversityIterative computation Different communication pattern
Challenge of Multi-tenancySlide11
ScalabilityMulti-tenancyServiceabilityLocality Awareness
High Cluster UtilizationReliability/AvailabilitySecure and auditable operationSupport for Programming Model Diversity
Flexible Resource ModelHadoop: # of Map/reduce slots are fixed.
Easy, but lower utilizationYARNSlide12
ScalabilityMulti-tenancyServiceabilityLocality Awareness
High Cluster UtilizationReliability/AvailabilitySecure and auditable operationSupport for Programming Model Diversity
Flexible Resource ModelBackward CompatibilityThe system behaves similar to the old
HadoopYARNSlide13
Separating resource management functions from the programming modelMapReduce
becomes just one of the applicationDryad, …. EtcBinary compatible/Source compatible
YARNSlide14
YARN: ArchitectureSlide15
One per clusterCentral, global viewEnable global propertiesFairness, capacity, locality
ContainerLogical bundle of resources (CPU/memory)Job requests are submitted to RMTo start a job, RM finds a container to spawn AM
No static resource partitioning
Resource ManagerSlide16
only handles an overall resource profile for each applicationLocal optimization/internal flow is up to the applicationPreemption
Request resources back from an applicationCheckpoint snapshot instead of explicitly killing jobs / migrate computation to other containers
Resource Manager (cont’)Slide17
The head of a jobRuns as a containerRequest resources from RM# of containers/ resource per container/ locality …
Dynamically changing resource consumptionCan run any user code (Dryad, MapReduce, Tez
, REEF…etc)Requests are “late-binding”
Application MasterSlide18
Optimizes for locality among map tasks with identical resource requirementsSelecting a task with input data close
to the container.AM determines the semantics of the success or failure of
the containerMapReduce
AMSlide19
The “worker” daemon. Registers with RMOne per nodeContainer Launch Context – env
var, commands…Report resources (memory/CPU/etc…)Configure the environment for task execution
Garbage collection/ AuthenticationAuxiliary servicesOutput intermediate data between map and reduce tasks
Node ManagerSlide20
Submitting the application by passing a CLC for the Application Master
to the RM.When RM starts the AM, it should register with the
RM and periodically advertise its liveness and requirements over the heartbeat protocol
Once the RM allocates a container, AM can construct a CLC to launch the container on the corresponding NM. It may also monitor the status of the running container and stop it when the resource should be reclaimed. Monitoring the progress of work done
inside the
container is strictly the AM’s responsibility.
Once
the AM is done with its work, it should
unregister from
the RM and exit cleanly.Optionally, framework authors may add control flow between their own clients to report job status
and expose a control plane.YARN framework/application writersSlide21
RM FailureRecover using persistent storageKill all containers, including AMs’
Relaunch AMsNM FailureRM detects it, mark the containers as killed, report to Ams
AM FailureRM kills the container and restarts it. Container Failure
The framework is responsible for recoveryFault tolerance and availabilitySlide22
In a 2500-node cluster, throughput improves from 77 K jobs/day to 150 K jobs/dayYARN at Yahoo!Slide23
In a 2500-node cluster, throughput improves from 4 M tasks/day to 10 M jobs/dayYARN at Yahoo!Slide24
Why? the removal of the static split between map and reduce slots.Essentially, moving to YARN, the CPU utilization almost doubled
“upgrading to YARN was equivalent to adding 1000 machines [to this 2500 machines cluster
]”YARN at Yahoo!Slide25
Pig, Hive, OozieDecompose a DAG job into multiple MR jobs
Apache TezDAG execution frameworkSpark
DryadGiraphVertice centric graph computation framework
fits naturally within YARN modelStorm – distributed real time processing engine (parallel stream processing)REEFSimplify implementing ApplicationMasterHaya –
Hbase
clusters
Applications/FrameworksSlide26
SortingMapReduce benchmarksPreemptionW/ Apache
TezREEF
EvaluationSlide27
2100 nodes, each w/ two 2.3Ghz hexcore Xeon E5-2630, 64 GB memory, and 12x3TB
disksRecord holderSortingSlide28
Compare Hadoop 2.1.0 (YARN) against 1.2.1260 node cluster
Each slave node is running 2.27GHz Intel(R) Xeon(R) CPU totalling to 16 cores, has 38GB
physicalmemory, and 6x1TB 7200 RPM disks
MapReduce benchmarksSlide29
Described in more details in the Natjam paper (also in SoCC’13)
Benefit of preemptionSlide30
W/ Apache TezSlide31
REEF