/
Jonathan Jonathan

Jonathan - PowerPoint Presentation

olivia-moreira
olivia-moreira . @olivia-moreira
Follow
407 views
Uploaded On 2017-05-14

Jonathan - PPT Presentation

Eastep David Wingate Marco D Santambrogio Anant Agarwal Smartlocks SelfAware Synchronization Multicores are Complex 2 The good Get performance scaling back on track with Moores Law ID: 548179

smartlocks lock policy smartlock lock smartlocks smartlock policy scheduling priority core thread work reward interface application performance acquisition results design conclusion architecture

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Jonathan" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Jonathan EastepDavid WingateMarco D. SantambrogioAnant Agarwal

Smartlocks: Self-Aware SynchronizationSlide2

Multicores are Complex2The goodGet performance scaling back on track with Moore’s LawThe BadSystem complexities are skyrocketing

Difficult to program multicores and utilize their performanceSlide3

Asymmetric Multicore is Worse3The ProblemDifferent capabilities, clock speeds = new layer of complexityProgrammers aren’t used to reasoning about asymmetry

Asymmetric Multicore

Core 0

Core 1

Core 2

Core 3

Why Asymmetric Multicore?

Improving Power / Performance

Increasing Manufacturing yieldSlide4

Self-Aware Computing Can Help4A promising recent approach to systems complexity managementMonitor themselves, adapting as necessary to meet their goalsSelf-aware systemsGoal-Oriented Computing, Ward et al., CSAIL

IBM K42 Operating System (OS w/ online reconfig.)Oracle Automatic Workload Repository (DB tuning)

Intel RAS Technologies for Enterprise (hw fault tol.)Slide5

Smartlocks Overview5Self-Aware technology applied to synchronization, resource sharing, programming modelsC/C++ spin-lock library for multicoreUses heuristics and machine learning to internally adapt its algorithms / behaviors

Reward signal provided by application monitorKey innovation: Lock Acquisition Scheduling

t1

t3

Lock Scheduler

Waiters

t2Slide6

Lock Acquisition Scheduling is the Key!

6

Thought experiment: 2 slow cores, 1 fast

improvement

4 CS

4 CSSlide7

Talk Outline7MotivationSmartlocks Architecture Smartlocks

InterfaceSmartlock DesignResults

ConclusionSlide8

ApplicationSmartlocks Architecture

8

Each

Smartlock self-optimizes as the app runs

Take reward from application monitoring framework

Reinforcement Learning adapts lock scheduling policy

Smartlock

Pthreads

Application Monitor: Heartbeats

Reward: Heart Rate

lock

ML

sched.

Lock Scheduling

PR Lock

Priorities

→Slide9

Do Scheduling with PR LocksPriority Lock (PR Lock)Releases lock to waiters preferentially (ordered by priority) Each potential lock holder (thread) has a priority To acquire, thread registers in wait priority queueUsually priority settings are set statically9

Lock Acquisition Scheduling

Augments PR Lock w/ ML engine to dynamically control priority settings

Scheduling policy = the set of thread priorities

Lock Scheduling

PR Lock

Priorities

p

t0

p

t2

p

t6

p

t8

p

t1

p

t11

t

i

= thread i; p

ti

= priority t

i

p

t1

p

t2

p

t3

p

tnSlide10

Talk Outline10MotivationSmartlocks Architecture

Smartlocks InterfaceSmartlock Design

ResultsConclusionSlide11

Smartlocks Interface11Similar to pthread mutexesDifference is interface for external monitorSmartlock queries monitor for reward signal

Function Prototype

Description

smartlock::smartlock

(int max_lockers, monitor *m_ptr)

Creates a

Smartlock

Smartlock

::~

smartlock

()

Destroys a

Smartlock

void

smartlock

::acquire(

int

id)

Acquires the lock

void

smartlock

::release(

int

id)

Releases the lockSlide12

Talk Outline12MotivationSmartlocks Architecture

Smartlocks Interface

Smartlock DesignResultsConclusionSlide13

Smartlocks Design ChallengesMajor Scheduling ChallengesThe Timeliness ChallengeScheduling too slowly could negate benefit of schedulingWhere do you get compute resources to optimize?The Quality ChallengeFinding policies with best long-term effectsNo model of system to guide direct optimization methodsEfficiently searching an exponential policy space

Overcoming stochastic / partially observable dynamics

13Slide14

Meeting The Timeliness Challenge14

Run adaptation algorithms in decoupled helper thread

Relax scheduling frequency to once every few locks

For efficiency, use PR locks as scheduling mechanism

ML engine updates priorities; PR lock runs decoupledSlide15

Meeting the Quality ChallengeMachine Learning, Reinforcement LearningNeed not know *how* to accomplish task just *when* you haveGood at learning actions that maximize long-term benefitNatural for application engineers to construct reward signalAddresses issues like stochastic / partially observable dynamics Policy GradientsComputationally cheap, fast, and straightforward to implementNeed no model of the system (we don’t have one!)Stochastic Soft-Max Policy

Relaxes exponential discrete action space into differentiable oneEffective, natural way to balance exploration vs. exploitation

15Slide16

The RL Problem Formulation16Goal: learn a policy p(action | q

)Action= PR lock priority settings (exponential space)k priorities levels, n threads

→ kn possible priority settingsq

are learned parametersReward is e.g. heart rate smoothed over small windowThus

p is a distribution over thread prioritizationsAt each timestep, we sample and execute a prioritizationOptimization objective: average reward h

Depends on the policy, which depends on

q

maximizeSlide17

Use Policy Gradients Approach17Approach: policy gradientsIdea: estimate the gradient of average reward h with respect to policy parameters

qApproximate with importance sampling

Take a step in the gradient directionSlide18

Talk Outline18MotivationSmartlocks Architecture

Smartlocks Interface

Smartlock DesignResults

ConclusionSlide19

Experimental Setup 119Simulated 6-core single-ISA asymmetric multicore w/ dynamic clock speeds

Throughput benchmark

Work-pile programming model (no stealing)1 producer, 4 workers

Record how long to perform n total work items

Fast cores finish work faster; if they spin it’s badTwo thermal Throttling EventsSlide20

Performance as a Function of Time20Workload 1: Worker 0 @ 3.16GHz, others @ 2.11GHzWorkload 2: Worker 3 @ 3.16GHz, others @ 2.11GHzWorkload 3: Same as Workload 1

gap

best w/ pri.

Smartlock

best w/o pri.

TAS

adaptation time-scaleSlide21

Policy as a Learned Function of Time21Workload 1: Worker 0 @ 3.16GHz, others @ 2.11GHzWorkload 2: Worker 3 @ 3.16GHz, others @ 2.11GHzWorkload 3: Same as Workload 1

Policy as a Learned Function of TimeSlide22

Experimental Setup 222Hardware asymmetry using cpufrequtils

8-core Intel Xeon machine

{2.11,2.11,2.11,2.11,2.11,2.11,3.16,3.16} GHz1 core reserved for Machine Learning (not required: helper thread could share a core)

Splash2First results:

RadiosityComputes equilibrium dist. of light in scene

Parallelism via work queues with stealing

Work items imbalanced (function of input scene)

Heartbeat for every work item completedSlide23

Radiosity Performance vs. Policy23BenchmarkStudy how lock scheduling affects performance ~20% difference between best and worst policyTAS (uniformly random) is in the middle

Smartlock within 3% of best policy

Radiosity (lower is better)

Execution

time (seconds)

SmartlocksSlide24

Smartlocks is Bigger Than This24Smartlock adapts each aspect of a lockProtocol: picks from {TAS,TASEB,Ticket,MCS,PR Lock}Wait Strategy: picks from {spin, spin with backoff}Scheduling Policy: arbitrary, optimized by RL engine

Smartlocks has an adaptation component for eachThis talk focuses on Lock Acquisition Scheduler

Smartlock

Protocol Selector

Lock Acquisition Scheduler

Wait Strategy Selector

Application MonitorSlide25

Conclusion25Smartlocks is a self-aware software library for synchronization / resource-sharingIdeal for multicores / applications with dynamic asymmetry

Lock Acquisition Scheduling is the key innovationSmartlocks is open source (COMING SOON!)

Code: http://github.com/Smartlocks/SmartlocksProject web-page:

http://groups.csail.mit.edu/carbon/smartlocks