/
idempotent idempotent

idempotent - PowerPoint Presentation

briana-ranney
briana-ranney . @briana-ranney
Follow
389 views
Uploaded On 2016-07-03

idempotent - PPT Presentation

ī dəm pō tənt adj 1 of relating to or being a mathematical quantity which when applied to itself equals itself  2  of relating to or being an operation under which a mathematical quantity is idempotent ID: 388896

antidependences region step idempotent region antidependences idempotent step idempotence overhead size local int gmean cut clobber regions remove algorithm

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "idempotent" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

idempotent

(ī-

dəm

-

-

tənt

)

adj.

1

of, relating to, or being a mathematical quantity which when applied to itself equals itself; 

2

 of, relating to, or being an operation under which a mathematical quantity is idempotent.

idempotent processing

(ī-

dəm

-

-

tənt

prə-ses-iŋ

)

n.

the application of only idempotent operations in sequence; said of the execution of computer programs in units of only idempotent computations, typically, to achieve

restartable

behavior.Slide2

Static Analysis and Compiler Design for Idempotent Processing

Marc de

KruijfKarthikeyan SankaralingamSomesh Jha

PLDI 2012, BeijingSlide3

Example

2

int sum(int *array, int len) { int x = 0;

for

(

int

i = 0; i < len; ++i) x += array[i]; return x;}

source codeSlide4

Example

3

R2 = load [R1] R3 = 0LOOP: R4 = load [R0 + R2] R3 = add R3, R4 R2 = sub R2, 1 bnez R2, LOOP

EXIT:

return R3

assembly code

FFF

F

0

faults

exceptions

x

load ?

mis

-speculationsSlide5

Example

4

R2 = load [R1] R3 = 0LOOP: R4 = load [R0 + R2] R3 = add R3, R4 R2 = sub R2, 1 bnez R2, LOOP

EXIT:

return R3

assembly code

Bad Stuff Happens!Slide6

R0

and

R1 are unmodified R2 = load [R1] R3 = 0LOOP: R4 = load [R0 + R2] R3 = add R3, R4 R2 = sub R2, 1 bnez

R2,

LOOP

EXIT:

return R3Example5assembly code

just re-execute!

convention:

use checkpoints/buffersSlide7

It’s Idempotent!

6

idempoh… what…?int sum(int *data, int len) {

int

x = 0;

for (int i = 0; i < len; ++i) x += data[i];

return

x;

}

=Slide8

Idempotent Processing

7

idempotent regionsAll The TimeSlide9

Idempotent

Processing

8

normal compiler:

custom compiler:

executive summary

low runtime overhead (typically 2-12%)

cut

semantic

clobber

antidependences

how?

idempotence

inhibited by

clobber

antidependencesSlide10

9

Presentation Overview

Idempotence

Algorithm

❸ Results

=Slide11

What is

Idempotence?

10Yes

2

is this idempotent?Slide12

What is

Idempotence?

11No

2

how about this?Slide13

What is

Idempotence?

12Yes

2

maybe this?Slide14

What

is Idempotence?

13

o

peration sequence

dependence chain

idempotent?write

read, write

write, read, write

Yes

No

Yes

it’s all about the data dependencesSlide15

What

is Idempotence?

14

o

peration sequence

dependence chain

idempotent?write, read

read, write

write, read, write

Yes

No

Yes

it’s all about the data dependences

Clobber

Antidependence

antidependence

with an exposed readSlide16

Semantic

Idempotence

15(1) local (“pseudoregister”) state:can be renamed to remove clobber antidependences*does not semantically constrain

idempotence

two types of program state

(2) non-local (“memory”) state:

cannot “rename” to avoid clobber antidependencessemantically constrains idempotence

semantic

idempotence

= no

non-local

clobber

antidep

.

preserve

local

state

by

renaming and careful allocationSlide17

16

Presentation Overview

❶ Idempotence❷ Algorithm

❸ Results

=Slide18

Region Construction Algorithm

17

steps one, two, and threeStep 1: transform function remove artificial dependences, remove non-clobbers

Step

3

:

refine for correctness & performance account for loops, optimize for dynamic behaviorStep 2: construct regions around antidependences cut all non-local antidependences in the CFGSlide19

Step 1

: Transform

18Transformation 1: SSA for pseudoregister antidependencesnot one, but two transformations

clobber

antidependences

region

boundariesregion identification

But we still have a problem:

depends onSlide20

Step 1

: Transform

19Transformation 1: SSA for pseudoregister antidependencesnot one, but two transformations

Transformation

2

:

Scalar replacement of memory variables [x] = a;b = [x]

;

[

x]

= c;

[

x]

= a;

b =

a

;

[

x]

= c;

non-clobber

antidependences

… GONE!Slide21

Step 1

: Transform

20Transformation 1: SSA for pseudoregister

antidependences

not one, but two transformations

Transformation

2: Scalar replacement of memory variables

clobber

antidependences

region

boundaries

region identification

depends onSlide22

Region Construction Algorithm

21

steps one, two, and threeStep 1: transform function remove artificial dependences, remove non-clobbers

Step

3

:

refine for correctness & performance account for loops, optimize for dynamic behaviorStep 2: construct regions around antidependences cut all non-local antidependences in the CFGSlide23

Step

2: Cut the CFG

22cut, cut, cut…

construct regions by “cutting” non-local

antidependences

antidependenceSlide24

l

arger is (generally) better:

large regions amortize the cost of input preservation23region size

overhead

sources of overhead

optimal region size?

Step 2: Cut the CFG

rough sketch

but where to cut…?Slide25

Step

2: Cut the CFG

24but where to cut…? goal: the minimum set of cuts that cuts all antidependence paths

intuition

: minimum cuts fewest regions large regions

approach

: a series of reductions:

minimum vertex multi-cut

(NP-complete)

minimum hitting set

among paths

minimum

hitting set

among “dominating nodes”

details in paper…Slide26

Region Construction Algorithm

25

steps one, two, and threeStep 1: transform function remove artificial dependences, remove non-clobbers

Step

3

:

refine for correctness & performance account for loops, optimize for dynamic behaviorStep 2: construct regions around antidependences cut all non-local antidependences in the CFGSlide27

Step 3

: Loop-Related Refinements

26correctness: Not all local antidependences removed by SSA…

loops affect correctness and performance

loop-carried

antidependences

may clobberdepends on boundary placement; handled as a post-passperformance: Loops tend to execute multiple times…

to maximize region size, place cuts outside of loop

algorithm modified to prefer cuts outside of loops

details in paper…Slide28

27

Presentation Overview

Idempotence

Algorithm

❸ Results

=Slide29

Results

compiler implementation

– Paper compiler implementation in LLVM v2.9 – LLVM v3.1 source code release in July timeframeexperimental data (1) runtime overhead (2) region size (3) use case

28Slide30

Runtime Overhead

29

as a percentagepercent overhead7.6

7.7

benchmark suites (

gmean

)(gmean)Slide31

Region Size

30

average number of instructions28 dynamic region size

(

gmean

)

benchmark suites (gmean)Slide32

Use Case

31

hardware fault recovery(gmean)

8.2

24.0

30.5

percent overhead

benchmark suites (

gmean

)Slide33

32

Presentation Overview

❶ Idempotence❸ Results

=

AlgorithmSlide34

Summary & Conclusions

33

idempotent processing – large (low-overhead) idempotent regions all the timestatic analysis, compiler algorithm – (a) remove artifacts (b) partition (c) compilesummary

low overhead

– 2-12% runtime overhead typicalSlide35

Summary & Conclusions

34

several applications already demonstrated – CPU hardware simplification (MICRO ’11) – GPU exceptions and speculation (ISCA ’12) – hardware fault recovery (this paper)conclusionsfuture work – more applications, hybrid techniques

– optimal region size?

– enabling even larger region sizesSlide36

Back-up Slides

35Slide37

Error recovery

36

mis-speculation (e.g. branch misprediction) – compiler handles for pseudoregister state – for non-local memory, store buffer assumedarbitrary failure (e.g. hardware fault) – ECC and other verification assumed – variety of existing techniques; details in paper

exceptions

– generally no side-effects beyond out-of-order-

ness

– fairly easy to handledealing with side-effectsSlide38

Optimal Region Size?

37

region sizeoverhead

detection

latency

register

pressure

re-execution

time

it depends… (rough sketch not to scale)Slide39

Prior Work

38

relating to idempotenceTechniqueYear

Domain

Sentinel Scheduling

1992

Speculative memory re-orderingFast Mutual Exclusion1992Uniprocessor mutual exclusionMulti-Instruction Retry

1995

Branch

and hardware fault recovery

Atomic Heap Transactions

1999

Atomic memory allocation

Reference

Idempotency

2006

Reducing speculative

storage

Restart Markers

2006

Virtual memory in vector machines

Data-Triggered

Threads

2011

Data-triggered multi-threading

Idempotent Processors2011

Hardware simplification for exceptions

Encore2011Hardware fault recovery

iGPU

2012

GPU exception/speculation supportSlide40

Detailed Runtime Overhead

39

as a percentagesuites (gmean)

outliers

(

gmean

)

percent overhead

7.6

7.7

non-idempotent inner loops + high register pressureSlide41

Detailed Region Size

40

average number of instructionssuites (gmean)

outliers

(

gmean

)

28

/

116

45

>1,000,000

limited aliasing information