Enqueuers and Dequeuers Alex Kogan Erez Petrank Computer Science Technion Israel Outline Queue data structure Progress guarantees Previous work on concurrent queues Review of the MSqueue ID: 144160
Download Presentation The PPT/PDF document "Wait-Free Queues with Multiple" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Wait-Free Queues with Multiple Enqueuers and Dequeuers
Alex
Kogan
Erez
Petrank
Computer Science,
Technion
, IsraelSlide2
OutlineQueue data structureProgress guaranteesPrevious work on concurrent queues
Review of the MS-queue
Our ideas in a nutshell
Review of the KP-queue
Performance results
Performance
optimizations
SummarySlide3
FIFO queuesOne of the most fundamental and common data structures
dequeue
5
3
2
enqueue
9Slide4
Concurrent FIFO queuesConcurrent implementation supports “correct” concurrent adding and removing elements
correct =
linearizable
The access to
the shared
memory should be synchronized
3
2
enqueue
9
empty!
dequeue
dequeue
dequeue
dequeueSlide5
Non-blocking synchronizationNo thread is blocked in waiting for another thread to completee.g., no locks / critical sections
Progress
guarantees:
Obstruction-freedom
progress is guaranteed only in the eventual absence of interference
Lock-freedom
among all threads trying to apply an operation, one will succeed
Wait-freedoma thread completes its operation in a bounded number of stepsSlide6
Lock-freedomAmong all threads trying to apply an operation, one will succeed
opportunistic approach
make attempts until succeeding
global progress
all but one threads may starve
Many
efficient and scalable lock-free queue implementationsSlide7
Wait-freedomA thread completes its operation in a bounded number of steps
regardless of what other threads are doing
A highly desired property of any concurrent data structure
but, commonly regarded as inefficient and too costly to achieve
Particularly important in several domains
real-time systems
operating under SLAheterogeneous environmentsSlide8
Related work: existing wait-free queues
Limited concurrency
one
enqueuer
and one
dequeuer
multiple enqueuers, one concurrent dequeuer
multiple dequeuers, one concurrent enqueuerUniversal constructions
generic method to transform any (sequential) object into lock-free/wait-free concurrent objectexpensive impractical implementations(Almost) no experimental results
[Lamport’83
][David’04]
[Jayanti&Petrovic’05]
[Herlihy’91]Slide9
Related work: lock-free queueOne of the most scalable and efficient lock-free implementations
Widely adopted by industry
part of Java Concurrency package
Relatively simple and intuitive implementation
Based on singly-linked list of nodes
12
4
17
head
tail
[Michael & Scott’96]Slide10
MS-queue brief review: enqueue
4
head
tail
enqueue
9
CAS
CAS
12
17
9Slide11
MS-queue brief review: enqueue
4
head
tail
enqueue
9
12
17
9
enqueue
5
5
CAS
CAS
CASSlide12
MS-queue brief review: dequeue
4
head
tail
dequeue
CAS
12
17
9
12Slide13
Our idea (in a nutshell)Based on the lock-free queue by
Michael & Scott
Helping mechanism
each operation is applied in a bounded time
“Wait-free” implementation scheme
each operation is applied exactly onceSlide14
Helping mechanismEach operation is assigned a dynamic age-based priority
inspired
by the Doorway mechanism used in Bakery
mutex
Each
thread accessing a queue
chooses a monotonically increasing phase number
writes down its phase and operation info in a special state
arrayhelps all threads with a non-larger phase to apply their operations
phase: long
pending:
boolean
enqueue:
boolean
node: Nodestate entry
per threadSlide15
Helping mechanism in action
4
true
true
ref
9
false
true
null
9
true
true
ref
3
false
true
ref
phase
pending
enqueue
nodeSlide16
Helping mechanism in action
4
true
true
ref
9
false
true
null
9
true
true
ref
10
true
true
ref
I need to help!
phase
pending
enqueue
nodeSlide17
Helping mechanism in action
4
true
true
ref
9
false
true
null
9
true
true
ref
10
true
true
ref
phase
pending
enqueue
node
I do not need to help!Slide18
Helping mechanism in action
4
true
true
ref
9
false
true
null
11
true
false
null
10
true
true
ref
phase
pending
enqueue
node
I do not need to help!
I need to help!Slide19
Helping mechanism in action
The
number of operations that may
linearize
before any given operation is bounded
hence, wait-freedom
4
true
true
ref
9
false
true
null
11
true
false
null
10
true
true
ref
phase
pending
enqueue
nodeSlide20
Optimized helpingThe basic scheme has two drawbacks:
the number of steps executed by each thread on every operation depends on
n
(the number of threads)
even when there is no contention
creates scenarios where many threads help same operations
e.g., when many threads access the queue concurrentlylarge redundant workOptimization: help
one thread at a time, in a cyclic mannerfaster threads help slower peers in parallelreduces the amount of redundant workSlide21
How to choose the phase numbersEvery time t
i
chooses a phase number, it is greater than the number of any thread that made its choice before
t
i
defines a logical order on operations and provides wait-freedomLike in Bakery mutex
:scan through statecalculate the maximal phase value + 1requires O(n) steps
Alternative: use an atomic counterrequires O(1) steps
4
true
true
ref
3
false
true
null
5
true
true
ref
6!Slide22
“Wait-free” design scheme
Break
each operation
into three atomic steps
can be executed by different threads
cannot be interleaved
Initial change of the internal structure
concurrent operations realize that there is an operation-in-progressUpdating the state of the operation-in-progress as being performed (linearized
)Fixing the internal structurefinalizing the operation-in-progressSlide23
Internal structures
4
head
tail
1
2
9
false
false
null
4
false
true
null
9
false
true
null
phase
pending
enqueue
node
stateSlide24
Internal structures
head
tail
9
false
false
null
4
false
true
null
9
false
true
null
phase
pending
enqueue
node
1
0
1
4
1
-1
2
0
-1
holds ID of the thread that
performs / has performed the insertion of the
node into the queue
these elements were
enqueued
by Thread 0
this element was
enqueued
by Thread 1
state
enqTid
:
intSlide25
Internal structures
head
tail
1
0
1
4
1
-1
2
0
-1
9
false
false
null
4
false
true
null
9
false
true
null
phase
pending
enqueue
node
state
deqTid
:
int
holds ID of the thread that
performs / has performed the
removal of
the
node into the queue
this element was
dequeued
by Thread 1Slide26
enqueue operation
head
tail
12
0
-1
4
1
-1
17
0
-1
6
2
-1
enqueue
6
ID: 2
9
false
false
null
4
false
true
null
9
false
true
null
phase
pending
enqueue
node
Creating a new node
stateSlide27
enqueue operation
head
tail
9
false
false
null
4
false
true
null
10
true
true
phase
pending
enqueue
node
Announcing a new operation
state
enqueue
6
ID: 2
12
0
-1
4
1
-1
17
0
-1
6
2
-1Slide28
enqueue operation
head
tail
9
false
false
null
4
false
true
null
10
true
true
phase
pending
enqueue
node
Step 1
: Initial
change of the internal structure
state
enqueue
6
ID: 2
12
0
-1
4
1
-1
17
0
-1
6
2
-1
CASSlide29
enqueue operation
head
tail
9
false
false
null
4
false
true
null
10
false
true
phase
pending
enqueue
node
Step 2
: Updating
the state of the
operation-in-progress
as being
performed
CAS
state
enqueue
6
ID: 2
12
0
-1
4
1
-1
17
0
-1
6
2
-1Slide30
enqueue operation
head
tail
9
false
false
null
4
false
true
null
10
false
true
phase
pending
enqueue
node
state
enqueue
6
ID: 2
12
0
-1
4
1
-1
17
0
-1
6
2
-1
Step 3
: Fixing
the internal structure
CASSlide31
enqueue operation
head
tail
9
false
false
null
4
false
true
null
10
true
true
phase
pending
enqueue
node
enqueue
3
ID: 0
state
enqueue
6
ID: 2
Step 1
: Initial
change of the internal structure
12
0
-1
4
1
-1
17
0
-1
6
2
-1Slide32
enqueue operation
head
tail
11
true
true
4
false
true
null
10
true
true
phase
pending
enqueue
node
enqueue
3
ID: 0
state
enqueue
6
ID: 2
12
0
-1
4
1
-1
17
0
-1
6
2
-1
3
0
-1
Creating a new node
Announcing a new operationSlide33
enqueue operation
head
tail
11
true
true
4
false
true
null
10
true
true
phase
pending
enqueue
node
enqueue
3
ID: 0
state
enqueue
6
ID: 2
12
0
-1
4
1
-1
17
0
-1
6
2
-1
3
0
-1
Step 2
: Updating
the state of the
operation-in-progress
as being
performedSlide34
enqueue operation
head
tail
11
true
true
4
false
true
null
10
false
true
phase
pending
enqueue
node
enqueue
3
ID: 0
state
enqueue
6
ID: 2
12
0
-1
4
1
-1
17
0
-1
6
2
-1
3
0
-1
Step 2
: Updating
the state of the
operation-in-progress
as being
performed
CASSlide35
enqueue operation
head
tail
11
true
true
4
false
true
null
10
false
true
phase
pending
enqueue
node
enqueue
3
ID: 0
state
enqueue
6
ID: 2
12
0
-1
4
1
-1
17
0
-1
6
2
-1
3
0
-1
Step 3
: Fixing
the internal structure
CASSlide36
enqueue operation
head
tail
11
true
true
4
false
true
null
10
false
true
phase
pending
enqueue
node
enqueue
3
ID: 0
state
enqueue
6
ID: 2
12
0
-1
4
1
-1
17
0
-1
6
2
-1
3
0
-1
Step 1
: Initial
change of the internal structure
CASSlide37
dequeue operation
head
tail
9
false
false
null
4
false
true
null
9
false
true
null
phase
pending
enqueue
node
state
12
0
-1
4
1
-1
17
0
-1
dequeue
ID: 2Slide38
dequeue operation
head
tail
9
false
false
null
4
false
true
null
10
true
false
null
phase
pending
enqueue
node
state
12
0
-1
4
1
-1
17
0
-1
dequeue
ID: 2
Announcing a new operationSlide39
dequeue operation
head
tail
9
false
false
null
4
false
true
null
10
true
false
phase
pending
enqueue
node
state
12
0
-1
4
1
-1
17
0
-1
dequeue
ID: 2
Updating
state
to refer the first node
CASSlide40
dequeue operation
head
tail
9
false
false
null
4
false
true
null
10
true
false
phase
pending
enqueue
node
state
12
0
2
4
1
-1
17
0
-1
dequeue
ID: 2
Step 1
: Initial
change of the internal structure
CASSlide41
dequeue operation
head
tail
9
false
false
null
4
false
true
null
10
false
false
phase
pending
enqueue
node
state
12
0
2
4
1
-1
17
0
-1
dequeue
ID: 2
Step 2
: Updating
the state of the
operation-in-progress
as being
performed
CASSlide42
dequeue operation
head
tail
9
false
false
null
4
false
true
null
10
false
false
phase
pending
enqueue
node
state
12
0
2
4
1
-1
17
0
-1
dequeue
ID: 2
Step 3
: Fixing
the internal structure
CASSlide43
Performance evaluation
Architecture
two
2.5 GHz
quadcore
Xeon E5420 processors
two
1.6 GHz
quadcore Xeon E5310 processors
# threads
8
8
8
RAM
16GB
16GB
16GB
OS
CentOS
5.5
Server
Ubuntu
8.10
Server
RedHat
Enterpise
5.3 Server
Java
Sun’s Java SE Runtime 1.6.0 update 22, 64-bit Server VMSlide44
BenchmarksEnqueue-Dequeue benchmark
the queue is initially empty
each thread iteratively performs
enqueue
and then
dequeue
1,000,000 iterations per thread50%-Enqueue
benchmarkthe queue is initialized with 1000 elementseach thread decides uniformly and random which operation to perform, with equal odds for enqueue and dequeue
1,000,000 operations per threadSlide45
Tested algorithmsCompared implementations:
MS-queue
Base wait-free queue
O
ptimized wait-free queue
Opt 1: optimized helping (help one thread at a time)
Opt 2: atomic counter-based phase calculationMeasure completion time as a function of # threadsSlide46
Enqueue-Dequeue benchmarkTBD: add figuresSlide47
The impact of optimizationsTBD: add figuresSlide48
Optimizing further: false sharingCreated on accesses to
state
array
Resolved by stretching the
state
with dummy pads
TBD: add figuresSlide49
Optimizing further: memory managementEvery attempt to update
state
is preceded by an allocation of a new record
these records can be reused when the attempt fails
(more) validation checks can be performed to reduce the number of failed attempts
When an operation is finished, remove the reference from
state to a list node
help garbage collectorSlide50
Implementing the queue without GCApply Hazard Pointers technique [Michael’04]each thread is associated with hazard pointers
single-writer multi-reader registers
used by threads to point on objects they may access later
when an object should be deleted, a thread stores its address in a special stack
once in a while, it scans the stack and recycle objects only if there are no hazard pointers pointing on it
In our case, the technique can be applied with a slight modification in the
dequeue
methodSlide51
SummaryFirst wait-free queue implementation supporting multiple enqueuers
and
dequeuers
Wait-freedom incurs an inherent trade-off
bounds the completion time of a single operation
has a cost in a “typical” case
The additional cost can be reduced and become tolerable
Proposed design scheme might be applicable for other wait-free data structuresSlide52
Thank you!Questions?