/
1Lecture 5 SynchronizationTopics synchronization primitives and optimi 1Lecture 5 SynchronizationTopics synchronization primitives and optimi

1Lecture 5 SynchronizationTopics synchronization primitives and optimi - PDF document

danya
danya . @danya
Follow
344 views
Uploaded On 2021-10-08

1Lecture 5 SynchronizationTopics synchronization primitives and optimi - PPT Presentation

2SynchronizationThe simplest hardware primitive that greatly facilitatessynchronization implementations locks barriers etcis an atomic readmodifywriteAtomic exchange swap contents of register and me ID: 897889

lock bar counter process bar lock process counter test store variable set traffic read sense flag mycount barriers atomic

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "1Lecture 5 SynchronizationTopics synchro..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1 1Lecture 5: Synchronization•Topics: sync
1Lecture 5: Synchronization•Topics: synchronization primitives and optimizations 2Synchronization•The simplest hardware primitive that greatly facilitatessynchronization implementations (locks, barriers, etc.)is an atomic read-modify-write•Atomic exchange: swap contents of register and memory•Special case of atomic exchange: test & se

2 t: transfermemory location into register
t: transfermemory location into register and write 1 into memory•lock: t&sregister, locationbnzregister, lockCSstlocation, #0 3Improving Lock Algorithms•The basic lock implementation is inefficient because thewaiting process is constantly attempting writes heavyinvalidate traffic•Test & Set with exponential back-off: if you fail a

3 gain,double your wait time and try again
gain,double your wait time and try again•Test & Test & Set: read the value, if it has not changed,don’t bother doing the test&set–heavy bus traffic onlywhen the lock is released•Different implementations trade-off one of these lockproperties: latency, traffic, scalability, storage, fairness 4Load-Linked and Store Conditional•LL-SC is

4 an implementation of atomic read-modify-
an implementation of atomic read-modify-writewith very high flexibility•LL: read a value and update a table indicating you haveread this address, then perform any amount of computation•SC: attempt to store a result into the same memory location,the store will succeed only if the table indicates that noother process attempted a store s

5 ince the local LL•SC implementations may
ince the local LL•SC implementations may not generate bus traffic if theSC fails –hence, more efficient than test&test&set 5Load-Linked and Store Conditionallockit: LL R2, 0(R1) ; load linked, generates no coherence trafficBNEZ R2, lockit; not available, keep spinningDADDUI R2, R0, #1 ; put value 1 in R2SC R2,

6 0(R1) ; store-conditional succeeds if
0(R1) ; store-conditional succeeds if no one; updated the locksince the last LLBEQZ R2, lockit; confirm that SC succeeded, else keep trying 6Further Reducing Bandwidth Needs•Even with LL-SC, heavy traffic is generated on a lockrelease and there are no fairness guarantees•Ticket lock: every arriving process atomically picks up at

7 icket and increments the ticket counter
icket and increments the ticket counter (with an LL-SC),the process then keeps checking the now-servingvariable to see if its turn has arrived, after finishing itsturn it increments the now-serving variable –is thisreally better than the LL-SC implementation?•Array-Based lock: instead of using a “now-serving”variable, use a “now-servi

8 ng” array and each processwaits on a dif
ng” array and each processwaits on a different variable –fair, low latency, lowbandwidth, high scalability, but higher storage 7Barriers•Barriers require each process to execute a lock andunlock to increment the counter and then spin on ashared variable•If multiple barriers use the same variable, deadlock canarise because some process

9 may not have left theearlier barrier –s
may not have left theearlier barrier –sense-reversing barriers can solve thisproblem•A tree can be employed to reduce contention for thelock and shared variable•When one process issues a read request, otherprocesses can snoop and update their invalid entries 8Barrier ImplementationLOCK(bar.lock);if (bar.counter== 0)bar.flag= 0;mycoun

10 t= bar.counter++;UNLOCK(bar.lock);if (my
t= bar.counter++;UNLOCK(bar.lock);if (mycount== p) {bar.counter= 0;bar.flag= 1;}lsewhile (bar.flag== 0) { }; 9Sense-Reversing Barrier Implementationlocal_sense= !(local_sense);LOCK(bar.lock);mycount= bar.counter++;UNLOCK(bar.lock);if (mycount== p) {bar.counter= 0;bar.flag= local_sense;}lse {while (bar.flag!= local_sense) { };} 10Tit