Adaptively Combining Pessimistic and Optimistic Synchronization for Efficient Parallel Runtime Support Man Cao Minjia Zhang Michael D Bond 1 Dynamic Analyses for Parallel Programs Data Race Detector ID: 580708
Download Presentation The PPT/PDF document "Drinking from Both Glasses" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Drinking from Both Glasses
: Adaptively Combining Pessimistic andOptimistic Synchronization for Efficient Parallel Runtime Support
Man CaoMinjia ZhangMichael D. Bond
1Slide2
Dynamic Analyses for Parallel ProgramsData Race Detector, Record & Replay, Transactional Memory,
Deterministic Execution, etc.Performance is usually bad!
several times slowerFundamental difficulties?
2Slide3
Cross-thread dependencesCrucial for dynamic analyses and systemsCapturing c
ross-thread dependencesDetectinge.g. data race detector, dependence recorderControllinge.g. transactional memory, deterministic execution
3
o.f
= …
… =
o.f
T1
T2Slide4
Typical approachPer-object metadata (state)E.g. last writer/reader threadAt each object access:Check current state
Analysis-specific actionUpdate state if neededPerform the access
Atomically
4Slide5
Typical approachPer-object metadata (state)E.g. last writer/reader threadAt each object access:
Check current stateAnalysis-specific actionUpdate state if neededPerform the access
Atomically
5
How to guarantee?Slide6
Pessimistic SynchronizationUsed by most existing workData Race Detector[FastTrack
, Flanagan & Freund, 2009]Atomicity Violation Detector[Velodrome, Flanagan et al., 2008]Record & Replay[Instant
Replay, LeBlanc et al., 1987][Chimera, Lee et al., 2012]6Slide7
Pessimistic Synchronization
7
LockMetadata()Slide8
Pessimistic Synchronization
8
LockMetadata()
Check and compute new metadataSlide9
Pessimistic Synchronization
9
LockMetadata()
Check and compute new metadata
Analysis-specific actionsSlide10
Pessimistic Synchronization
10
LockMetadata()
Check and compute new metadata
Program access
Analysis-specific actionsSlide11
Pessimistic Synchronization
11
LockMetadata()
Check and compute new metadata
Program access
UnlockAndUpdateMetadata
()
Analysis-specific actionsSlide12
Pessimistic SynchronizationSynchronization on every access
6X slowdown on average12
LockMetadata
()
Check and compute new metadata
Program access
UnlockAndUpdateMetadata
()
Analysis-specific actionsSlide13
Optimistic Synchronization13
Used to improve performance
Biased Locking
[Lock Reservation,
Kawachiya
et al., 2002]
[
Bulk
Rebiasing
, Russell
&
Detlefs
, 2006]
Distributed Memory System
[Shasta,
Scales et al. 1996]
Framework Support
[Octet, Bond et al. 2013] Slide14
Optimistic Synchronization14
Avoid
synchronization for non-conflicting accesses
Heavyweight coordination for conflicting
accessesSlide15
Optimistic Synchronization (Cont.)
T1
T2
wr
o.f
write check
wr o.f
write check
15Slide16
Optimistic Synchronization (Cont.)
T1
T2
wr
o.f
write check
read
check
wr o.f
write check
16Slide17
Optimistic Synchronization (Cont.)
T1
T2
wr
o.f
safe point
write check
read check
Analysis-specific action
wr o.f
write check
17Slide18
Optimistic Synchronization (Cont.)
T1
T2
wr
o.f
safe point
write check
read check
Analysis-specific action
change metadata
wr o.f
write check
rd o.f
18Slide19
Optimistic Synchronization (Cont.)
T1
T2
wr
o.f
safe point
write check
read check
Analysis-specific action
change
metadata
wr o.f
write check
rd o.f
19
26
% on average with
outliers
Expensive if there are
many conflicting accessesSlide20
Optimistic synchronization performs best if there are few conflicting accesses.
20Slide21
Pessimistic synchronization is cheaper for conflicting accesses.
21Slide22
Drink from both glasses?Goal:Optimistic sync. for most non-conflicting accessesPessimistic sync. for most conflicting accesses
Our approach:Hybrid state modelAdaptive policySupport for detecting and controlling cross-thread dependences
22Slide23
Adaptive PolicyDecides when to perform
Pess → Opt and Opt → Pess transitionsCost—Benefit model
Formulates the problemOnline profilingEfficiently collects information and approximates the Cost-Benefit model
23Slide24
Cost—Benefit modelCompares
total time spent in transitions if an object were optimistic or pessimisticWhichever takes less time is beneficial
Only
relies on
numbers
(or just the
ratio
)
of
non-conflicting and conflicting transitions
24Slide25
EvaluationImplementationJikes RVM 3.1.3Parallel programs
DaCapo Benchmarks 2006 &
2009SPEC JBB 2000
&
2005
Platform
32 cores (AMD Opteron 6272
)
25Slide26
Performance26Slide27
Performance27Slide28
Performance28Slide29
Framework supportDetecting cross-thread dependencesdependence recorderKey challengeIdentify the source location of a happens-before edge for a pessimistic conflicting transition
Current solution requires acquiring a lock and writing to remote thread’s log
29Slide30
Framework support (Cont.)Controlling cross-thread dependencesenforcing Region Serializability
(in progress)Key challenge Need to keep locking pessimistic objects until the end of a regionPossible solutionDefer unlocking of pessimistic objects until program lock releases
Helps dependence recorderSimplifies instrumentation30Slide31
Framework support (Cont.)Controlling cross-thread dependencesenforcing Region Serializability
(in progress)Key challenge Need to keep locking pessimistic objects until the end of a regionPossible solutionDefer unlocking of pessimistic objects until program lock releasesHelps dependence recorder
Simplifies instrumentation31Slide32
Conclusion & Future workHybrid, adaptive synchronization achieves better performancenever significantly degrades performancesometimes improves performance substantially
Future directionsExplore different adaptive policies (e.g. aggregate profiling)Reduce instrumentation cost by deferring unlock operations of pessimistic synchronizationApply to control
cross-thread dependences32