/
Apollo Weize Sun Feb.17 th Apollo Weize Sun Feb.17 th

Apollo Weize Sun Feb.17 th - PowerPoint Presentation

abigail
abigail . @abigail
Follow
64 views
Uploaded On 2024-01-03

Apollo Weize Sun Feb.17 th - PPT Presentation

2017 Critical properties of Apollo Distributed and coordinated scheduling framework Assign tasks to server with minimal estimated completion time Provide nearfuture states of servers Correction mechanism ID: 1037312

task tasks scheduling time tasks task time scheduling estimated opportunistic wait decision pns resource server token based hurt job

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Apollo Weize Sun Feb.17 th" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. ApolloWeize Sun Feb.17th, 2017

2. Critical properties of ApolloDistributed and coordinated scheduling frameworkAssign tasks to server with minimal estimated completion timeProvide near-future states of serversCorrection mechanismOpportunistic scheduling

3. Capacity Management and tokensApollo uses a token-based mechanism.Each token is defined as a right to execute a task with predefined amount of resource.The more tokens a job has, the more tasks it can run.

4. Architectural Overview

5. Job Manager (Scheduler)Every JM is responsible for one jobReceive global cluster information from the collaboration of RM and PNs

6. Process NodeManages a queue of tasks assigned to the serverThe cooperation between PNs and JM:When JM give a task to PN, it passes resource requirements, estimated time, required files to PN.PN provides feedbacks to JM to help improve accuracy of task runtime estimation.

7. Resource MonitorProvide global view of cluster status Collect information from PNsBuild wait-time matrixNote: RM is not criticalIf RM is unfortunately down, it will not hurt the Apollo implementation too much.JM can still make a locally optimal decision based on the feedback of PNs

8. Task Priority and Stable matchingJM will analysis the DAG of each taskJM makes independent scheduling designs, and pick the best one.Note: breakdown of some scheduling does not hurt the overall optimal decision makingJM use stable matching to limit the search space.What if two JMs make decision that has collision?

9. Correction MechanismUnlike other systems, Apollo implements the correction process after the task is dispatched to the server.JM will reassign the task to the server if the wait time is too greater than estimated or a much better pattern is designedApollo also use randomization to reduce the collisionApollo adds weight to different wait time matrix to check the accuracy.

10. Opportunistic SchedulingThere are two kinds of tasks in Apollo– regular tasks & opportunistic tasksApollo adapts randomized allocationApollo runs regular tasks first, and uses the rest resources to run opportunistic tasksResourceRegular TasksOpportunistic Tasks