/
Wait-Free Queues with Multiple Wait-Free Queues with Multiple

Wait-Free Queues with Multiple - PowerPoint Presentation

mitsue-stanley
mitsue-stanley . @mitsue-stanley
Follow
393 views
Uploaded On 2015-09-29

Wait-Free Queues with Multiple - PPT Presentation

Enqueuers and Dequeuers Alex Kogan Erez Petrank Computer Science Technion Israel Outline Queue data structure Progress guarantees Previous work on concurrent queues Review of the MSqueue ID: 144160

false true null enqueue true false enqueue null operation node phase state pending head dequeue tail thread queue cas

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Wait-Free Queues with Multiple" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Wait-Free Queues with Multiple Enqueuers and Dequeuers

Alex

Kogan

Erez

Petrank

Computer Science,

Technion

, IsraelSlide2

OutlineQueue data structureProgress guaranteesPrevious work on concurrent queues

Review of the MS-queue

Our ideas in a nutshell

Review of the KP-queue

Performance results

Performance

optimizations

SummarySlide3

FIFO queuesOne of the most fundamental and common data structures

dequeue

5

3

2

enqueue

9Slide4

Concurrent FIFO queuesConcurrent implementation supports “correct” concurrent adding and removing elements

correct =

linearizable

The access to

the shared

memory should be synchronized

3

2

enqueue

9

empty!

dequeue

dequeue

dequeue

dequeueSlide5

Non-blocking synchronizationNo thread is blocked in waiting for another thread to completee.g., no locks / critical sections

Progress

guarantees:

Obstruction-freedom

progress is guaranteed only in the eventual absence of interference

Lock-freedom

among all threads trying to apply an operation, one will succeed

Wait-freedoma thread completes its operation in a bounded number of stepsSlide6

Lock-freedomAmong all threads trying to apply an operation, one will succeed

opportunistic approach

make attempts until succeeding

global progress

all but one threads may starve

Many

efficient and scalable lock-free queue implementationsSlide7

Wait-freedomA thread completes its operation in a bounded number of steps

regardless of what other threads are doing

A highly desired property of any concurrent data structure

but, commonly regarded as inefficient and too costly to achieve

Particularly important in several domains

real-time systems

operating under SLAheterogeneous environmentsSlide8

Related work: existing wait-free queues

Limited concurrency

one

enqueuer

and one

dequeuer

multiple enqueuers, one concurrent dequeuer

multiple dequeuers, one concurrent enqueuerUniversal constructions

generic method to transform any (sequential) object into lock-free/wait-free concurrent objectexpensive impractical implementations(Almost) no experimental results

[Lamport’83

][David’04]

[Jayanti&Petrovic’05]

[Herlihy’91]Slide9

Related work: lock-free queueOne of the most scalable and efficient lock-free implementations

Widely adopted by industry

part of Java Concurrency package

Relatively simple and intuitive implementation

Based on singly-linked list of nodes

12

4

17

head

tail

[Michael & Scott’96]Slide10

MS-queue brief review: enqueue

4

head

tail

enqueue

9

CAS

CAS

12

17

9Slide11

MS-queue brief review: enqueue

4

head

tail

enqueue

9

12

17

9

enqueue

5

5

CAS

CAS

CASSlide12

MS-queue brief review: dequeue

4

head

tail

dequeue

CAS

12

17

9

12Slide13

Our idea (in a nutshell)Based on the lock-free queue by

Michael & Scott

Helping mechanism

each operation is applied in a bounded time

“Wait-free” implementation scheme

each operation is applied exactly onceSlide14

Helping mechanismEach operation is assigned a dynamic age-based priority

inspired

by the Doorway mechanism used in Bakery

mutex

Each

thread accessing a queue

chooses a monotonically increasing phase number

writes down its phase and operation info in a special state

arrayhelps all threads with a non-larger phase to apply their operations

phase: long

pending:

boolean

enqueue:

boolean

node: Nodestate entry

per threadSlide15

Helping mechanism in action

4

true

true

ref

9

false

true

null

9

true

true

ref

3

false

true

ref

phase

pending

enqueue

nodeSlide16

Helping mechanism in action

4

true

true

ref

9

false

true

null

9

true

true

ref

10

true

true

ref

I need to help!

phase

pending

enqueue

nodeSlide17

Helping mechanism in action

4

true

true

ref

9

false

true

null

9

true

true

ref

10

true

true

ref

phase

pending

enqueue

node

I do not need to help!Slide18

Helping mechanism in action

4

true

true

ref

9

false

true

null

11

true

false

null

10

true

true

ref

phase

pending

enqueue

node

I do not need to help!

I need to help!Slide19

Helping mechanism in action

The

number of operations that may

linearize

before any given operation is bounded

hence, wait-freedom

4

true

true

ref

9

false

true

null

11

true

false

null

10

true

true

ref

phase

pending

enqueue

nodeSlide20

Optimized helpingThe basic scheme has two drawbacks:

the number of steps executed by each thread on every operation depends on

n

(the number of threads)

even when there is no contention

creates scenarios where many threads help same operations

e.g., when many threads access the queue concurrentlylarge redundant workOptimization: help

one thread at a time, in a cyclic mannerfaster threads help slower peers in parallelreduces the amount of redundant workSlide21

How to choose the phase numbersEvery time t

i

chooses a phase number, it is greater than the number of any thread that made its choice before

t

i

defines a logical order on operations and provides wait-freedomLike in Bakery mutex

:scan through statecalculate the maximal phase value + 1requires O(n) steps

Alternative: use an atomic counterrequires O(1) steps

4

true

true

ref

3

false

true

null

5

true

true

ref

6!Slide22

“Wait-free” design scheme

Break

each operation

into three atomic steps

can be executed by different threads

cannot be interleaved

Initial change of the internal structure

concurrent operations realize that there is an operation-in-progressUpdating the state of the operation-in-progress as being performed (linearized

)Fixing the internal structurefinalizing the operation-in-progressSlide23

Internal structures

4

head

tail

1

2

9

false

false

null

4

false

true

null

9

false

true

null

phase

pending

enqueue

node

stateSlide24

Internal structures

head

tail

9

false

false

null

4

false

true

null

9

false

true

null

phase

pending

enqueue

node

1

0

1

4

1

-1

2

0

-1

holds ID of the thread that

performs / has performed the insertion of the

node into the queue

these elements were

enqueued

by Thread 0

this element was

enqueued

by Thread 1

state

enqTid

:

intSlide25

Internal structures

head

tail

1

0

1

4

1

-1

2

0

-1

9

false

false

null

4

false

true

null

9

false

true

null

phase

pending

enqueue

node

state

deqTid

:

int

holds ID of the thread that

performs / has performed the

removal of

the

node into the queue

this element was

dequeued

by Thread 1Slide26

enqueue operation

head

tail

12

0

-1

4

1

-1

17

0

-1

6

2

-1

enqueue

6

ID: 2

9

false

false

null

4

false

true

null

9

false

true

null

phase

pending

enqueue

node

Creating a new node

stateSlide27

enqueue operation

head

tail

9

false

false

null

4

false

true

null

10

true

true

phase

pending

enqueue

node

Announcing a new operation

state

enqueue

6

ID: 2

12

0

-1

4

1

-1

17

0

-1

6

2

-1Slide28

enqueue operation

head

tail

9

false

false

null

4

false

true

null

10

true

true

phase

pending

enqueue

node

Step 1

: Initial

change of the internal structure

state

enqueue

6

ID: 2

12

0

-1

4

1

-1

17

0

-1

6

2

-1

CASSlide29

enqueue operation

head

tail

9

false

false

null

4

false

true

null

10

false

true

phase

pending

enqueue

node

Step 2

: Updating

the state of the

operation-in-progress

as being

performed

CAS

state

enqueue

6

ID: 2

12

0

-1

4

1

-1

17

0

-1

6

2

-1Slide30

enqueue operation

head

tail

9

false

false

null

4

false

true

null

10

false

true

phase

pending

enqueue

node

state

enqueue

6

ID: 2

12

0

-1

4

1

-1

17

0

-1

6

2

-1

Step 3

: Fixing

the internal structure

CASSlide31

enqueue operation

head

tail

9

false

false

null

4

false

true

null

10

true

true

phase

pending

enqueue

node

enqueue

3

ID: 0

state

enqueue

6

ID: 2

Step 1

: Initial

change of the internal structure

12

0

-1

4

1

-1

17

0

-1

6

2

-1Slide32

enqueue operation

head

tail

11

true

true

4

false

true

null

10

true

true

phase

pending

enqueue

node

enqueue

3

ID: 0

state

enqueue

6

ID: 2

12

0

-1

4

1

-1

17

0

-1

6

2

-1

3

0

-1

Creating a new node

Announcing a new operationSlide33

enqueue operation

head

tail

11

true

true

4

false

true

null

10

true

true

phase

pending

enqueue

node

enqueue

3

ID: 0

state

enqueue

6

ID: 2

12

0

-1

4

1

-1

17

0

-1

6

2

-1

3

0

-1

Step 2

: Updating

the state of the

operation-in-progress

as being

performedSlide34

enqueue operation

head

tail

11

true

true

4

false

true

null

10

false

true

phase

pending

enqueue

node

enqueue

3

ID: 0

state

enqueue

6

ID: 2

12

0

-1

4

1

-1

17

0

-1

6

2

-1

3

0

-1

Step 2

: Updating

the state of the

operation-in-progress

as being

performed

CASSlide35

enqueue operation

head

tail

11

true

true

4

false

true

null

10

false

true

phase

pending

enqueue

node

enqueue

3

ID: 0

state

enqueue

6

ID: 2

12

0

-1

4

1

-1

17

0

-1

6

2

-1

3

0

-1

Step 3

: Fixing

the internal structure

CASSlide36

enqueue operation

head

tail

11

true

true

4

false

true

null

10

false

true

phase

pending

enqueue

node

enqueue

3

ID: 0

state

enqueue

6

ID: 2

12

0

-1

4

1

-1

17

0

-1

6

2

-1

3

0

-1

Step 1

: Initial

change of the internal structure

CASSlide37

dequeue operation

head

tail

9

false

false

null

4

false

true

null

9

false

true

null

phase

pending

enqueue

node

state

12

0

-1

4

1

-1

17

0

-1

dequeue

ID: 2Slide38

dequeue operation

head

tail

9

false

false

null

4

false

true

null

10

true

false

null

phase

pending

enqueue

node

state

12

0

-1

4

1

-1

17

0

-1

dequeue

ID: 2

Announcing a new operationSlide39

dequeue operation

head

tail

9

false

false

null

4

false

true

null

10

true

false

phase

pending

enqueue

node

state

12

0

-1

4

1

-1

17

0

-1

dequeue

ID: 2

Updating

state

to refer the first node

CASSlide40

dequeue operation

head

tail

9

false

false

null

4

false

true

null

10

true

false

phase

pending

enqueue

node

state

12

0

2

4

1

-1

17

0

-1

dequeue

ID: 2

Step 1

: Initial

change of the internal structure

CASSlide41

dequeue operation

head

tail

9

false

false

null

4

false

true

null

10

false

false

phase

pending

enqueue

node

state

12

0

2

4

1

-1

17

0

-1

dequeue

ID: 2

Step 2

: Updating

the state of the

operation-in-progress

as being

performed

CASSlide42

dequeue operation

head

tail

9

false

false

null

4

false

true

null

10

false

false

phase

pending

enqueue

node

state

12

0

2

4

1

-1

17

0

-1

dequeue

ID: 2

Step 3

: Fixing

the internal structure

CASSlide43

Performance evaluation

Architecture

two

2.5 GHz

quadcore

Xeon E5420 processors

two

1.6 GHz

quadcore Xeon E5310 processors

# threads

8

8

8

RAM

16GB

16GB

16GB

OS

CentOS

5.5

Server

Ubuntu

8.10

Server

RedHat

Enterpise

5.3 Server

Java

Sun’s Java SE Runtime 1.6.0 update 22, 64-bit Server VMSlide44

BenchmarksEnqueue-Dequeue benchmark

the queue is initially empty

each thread iteratively performs

enqueue

and then

dequeue

1,000,000 iterations per thread50%-Enqueue

benchmarkthe queue is initialized with 1000 elementseach thread decides uniformly and random which operation to perform, with equal odds for enqueue and dequeue

1,000,000 operations per threadSlide45

Tested algorithmsCompared implementations:

MS-queue

Base wait-free queue

O

ptimized wait-free queue

Opt 1: optimized helping (help one thread at a time)

Opt 2: atomic counter-based phase calculationMeasure completion time as a function of # threadsSlide46

Enqueue-Dequeue benchmarkTBD: add figuresSlide47

The impact of optimizationsTBD: add figuresSlide48

Optimizing further: false sharingCreated on accesses to

state

array

Resolved by stretching the

state

with dummy pads

TBD: add figuresSlide49

Optimizing further: memory managementEvery attempt to update

state

is preceded by an allocation of a new record

these records can be reused when the attempt fails

(more) validation checks can be performed to reduce the number of failed attempts

When an operation is finished, remove the reference from

state to a list node

help garbage collectorSlide50

Implementing the queue without GCApply Hazard Pointers technique [Michael’04]each thread is associated with hazard pointers

single-writer multi-reader registers

used by threads to point on objects they may access later

when an object should be deleted, a thread stores its address in a special stack

once in a while, it scans the stack and recycle objects only if there are no hazard pointers pointing on it

In our case, the technique can be applied with a slight modification in the

dequeue

methodSlide51

SummaryFirst wait-free queue implementation supporting multiple enqueuers

and

dequeuers

Wait-freedom incurs an inherent trade-off

bounds the completion time of a single operation

has a cost in a “typical” case

The additional cost can be reduced and become tolerable

Proposed design scheme might be applicable for other wait-free data structuresSlide52

Thank you!Questions?