/
Comparing GCs and Allocation Comparing GCs and Allocation

Comparing GCs and Allocation - PowerPoint Presentation

tatiana-dople
tatiana-dople . @tatiana-dople
Follow
343 views
Uploaded On 2019-06-22

Comparing GCs and Allocation - PPT Presentation

Richard Jones Antony Hosking and Eliot Moss 2012 Presented by Yarden Marton 181114 Comparing between different garbage collectors Allocation methods and considerations Outline ID: 759762

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Comparing GCs and Allocation" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Comparing GCs and Allocation

Richard Jones, Antony Hosking and Eliot Moss

2012

Presented by

Yarden

Marton

18.11.14

Slide2

Comparing between different garbage collectors.Allocation – methods and considerations.

Outline

Slide3

Comparing GCs

What is the

best

GC?

When we say “best” do we mean:

Best throughput?

Shortest pause times?

Good space utilization?

Compromised combination?

Slide4

Comparing GCs

More to consider:

Application dependency

Heap space availability

Heap size

Slide5

Slide6

ThroughputPause timeSpaceImplementation

Comparing GCs - Aspects

Slide7

ThroughputPause timeSpaceImplementation

Comparing GCs - Aspects

Slide8

Primary goal for ‘batch’ applications or for systems experiencing delays.Does a faster collector means faster application? Not necessarily.Mutators pay the cost

Throughput

Slide9

Throughput

Algorithmic complexity

Mark-sweep:

cost of tracing and sweeping phases.

Requires visiting every object

Copying:

cost of tracing phase only

Requires visiting only live objects

Slide10

Throughput

Is Copying collection faster?

Not necessarily:

Number of instructions executed to visit an object

Locality

Lazy sweeping

Slide11

ThroughputPause timeSpaceImplementation

Comparing GCs - Aspects

Slide12

Pause Time

Important for interactive applications, transaction processors and more

.

‘stop-the-world’ collectors

Immediate attraction to reference counting

However:

Recursive reference count is costly

Both improvements of reference count reintroduce a stop-the-world pause

Slide13

ThroughputPause timeSpaceImplementation

Comparing GCs - Aspects

Slide14

Space

Important for:

T

ight

physical constraints on

memory

Large applications

All collectors incur space overhead:

Reference count fields

Additional heap space

Heap fragmentation

Auxiliary data structures

Room for garbage

Slide15

Space

Completeness

– reclaiming

all dead objects

eventually.

Basic reference counting is incomplete

Promptness

– reclaiming all dead objects at each collection cycle.

-

Basic

tracing

collectors (but with a cost)

Modern

high-performances collectors typically trade immediacy for

performance.

Slide16

ThroughputPause timeSpaceImplementation

Comparing GCs - Aspects

Slide17

Implementation

GC algorithms are difficult to implement, especially concurrent algorithms.

Errors can manifest themselves long afterwards

Tracing:

Advantage

: Simple collector-

mutator

interface

Disadvantage

: Determining roots is complicated

Reference counting:

Advantage

:

Can be implemented in a library

Disadvantage

:

Processing overheads and correctness essentiality of all reference count manipulation.

In general, copying and compacting collectors are more complex than non-moving collectors.

Slide18

Adaptive Systems

Commercial system often offer a choice between GCs, with a large number of tuning options.

Researchers have developed systems that adapts to the

enviroment

:

Java run-time (

Soman

et al [2004])

Singer et al [2007a]

Sun’s Ergonomic tuning

Slide19

Advice For Developers

Know your application:

- Measure its behavior

- Track the size and lifetime distributions of the objects it uses.

Experiment with the different collector configurations on offer.

Slide20

Considered two styles of collection:Direct, reference counting.Indirect, tracing collection.Next: An abstract framework for a wide variety of collectors.

A Unified Theory of GC

Slide21

GC can be expressed as a fixed point computation that assigns reference counts (n) to nodes n Nodes.Nodes with non-zero count are retained and the rest should be reclaimed.Use of abstract data structures whose implementations can vary.W – a work list of objects to be processed. When empty, the algorithms terminate.

 

Abstract GC

Slide22

atomic collectTracing(): rootsTracing(W) //find root objects scanTracing(W) //mark reachable objects sweepTracing() //free dead objectsrootsTracing(R): for each fld in Roots ref ← *fld if ref ≠ null R ← R + [ref]scanTracing(W): while not isEmpty(W) src ← remove(W) (src) ← (src)+1 if (src) = 1 for each fld in Pointers(src) ref ← *fld if ref ≠ null W ← W + [ref]

 

Abstract Tracing GC Algorithm

Slide23

sweepTracing(): for each noed in Nodes if (node) = 0 free(node) else (node) ← 0New(): ref ← allocate() if ref = null collectTracing() ref ← allocate() if ref = null error “Out of memory” (ref) ← 0 return ref

 

Abstract Tracing GC Algorithm (Continued)

Slide24

A

D

C

B

Roots

A

B

C

D

0

0

0

0

A

B

CD0000

W

Slide25

A

D

C

B

Roots

A

B

C

D

0

0

0

0

A

B

CD0000

WBC

Slide26

A

D

C

B

Roots

A

B

C

D

0

1

0

0

A

B

CD0100

WC

Slide27

A

D

C

B

Roots

A

B

C

D

0

1

1

0

A

B

CD0110

WAB

Slide28

A

D

C

B

Roots

A

B

C

D

1

1

1

0

A

B

CD1110

WB

Slide29

A

D

C

B

Roots

A

B

C

D

1

2

1

0

A

B

CD1210

W

Slide30

A

D

C

B

Roots

A

B

C

D

0

0

0

0

A

B

CD0000

W

Slide31

atomic collectCounting(I,D): applyIncrements(I) //increase necessary scanCounting(D) //decrease reqursivaly sweepCounting() //free dead objectsapplyIncrements(I): while not isEmpty(I) ref ← remove(I) (ref) ← (ref)+1 scanCounting(W): while not isEmpty(W) src ← remove(W) (src) ← (src)-1 if (src) = 0 for each fld in Pointers(src) ref ← *fld if ref ≠ null W ← W + [ref]

 

Abstract reference counting GC Algorithm

Slide32

sweepCounting(): for each node in Nodes if (node) = 0 free(node)New(): ref ← allocate() if ref = null collectCounting() ref ← allocate() if ref = null error “Out of memory” (ref) ← 0 return ref

 

Abstract reference counting GC

Algorithm (Continued)

Slide33

inc(ref): if ref ≠ null I ← I + [ref] dec(ref): if ref ≠ null D ← D + [ref]Atomic Write(src, i, dst): inc(dst) dec(src[i]) src[i] ← dst

Abstract reference counting GC

Algorithm (Continued)

Slide34

ABCD0000

ABCD0000

A

D

C

B

Roots

I

A

B

A

D

B

C

B

D

A

D

atomic

collectCounting

()

applyIncrements

(I)

Slide35

ABCD1000

ABCD1000

A

D

C

B

Roots

atomic

collectCounting

()

applyIncrements

(I)

I

B

A

D

B

C

B

D

A

D

Slide36

ABCD2311

ABCD2311

A

D

C

B

Roots

atomic

collectCounting

()

applyIncrements

(I)

I

D

A

D

Slide37

ABCD1310

ABCD1310

A

D

C

B

Roots

atomic

collectCounting

()

applyIncrements

(I)

scanCounting

(D

)

I

D

B

Slide38

ABCD1210

ABCD1210

A

D

C

B

Roots

atomic

collectCounting

()

applyIncrements

(I)

scanCounting

(D

)

I

D

Slide39

ABCD1210

ABCD1210

A

D

C

B

Roots

atomic

collectCounting

()

applyIncrements

(I)

scanCounting

(D)

sweepCounting

()

I

D

Slide40

Atomic collecDrc(I,D): rootsTracing(I) //add root objects to I applyIncrements(I) //increase necessary scanCounting(D) //decrease reqursively sweepCounting() //free dead objects rootsTracing(D) //keep invariant applyDecrements(D)New(): ref ← allocate() if ref = null collecDrc(I,D) ref ← allocate() if ref = null error “Out of memory” (ref) ← 0 return ref

 

Abstract deferred

reference counting GC

Algorithm

Slide41

Atomic Write(src, i, dst): if src ≠ Roots inc(dst) dec(src[i]) src[i] ← dstapplyDecrements(D): while not isEmpty(D) ref ← remove(D) (ref) ← (ref)-1

 

Abstract deferred

reference counting GC

Algorithm (Continued)

Slide42

ABCD0000

ABCD0000

A

D

C

B

Roots

I

A

B

A

D

B

D

A

D

atomic

collectDrc

()

rootsTracing

(I)

Slide43

ABCD0000

ABCD0000

A

D

C

B

Roots

I

A

B

A

D

B

B

C

D

A

D

atomic

collectDrc

()

rootsTracing

(I) applyIncrements(I)

Slide44

ABCD2311

ABCD2311

A

D

C

B

Roots

I

D

A

D

atomic

collectDrc

()

rootsTracing

(I)

applyIncrements

(I)

scanCounting

(D)

Slide45

ABCD1210

ABCD1210

A

D

C

B

Roots

I

D

atomic

collectDrc

()

rootsTracing

(I)

applyIncrements

(I)

scanCounting

(D)

sweepCounting

()

Slide46

ABCD1210

ABCD1210

A

D

C

B

Roots

I

D

atomic

collectDrc

()

rootsTracing

(I)

applyIncrements

(I)

scanCounting

(D)

sweepCounting

() rootsTracing(D)

Slide47

ABC121

ABC121

A

C

B

Roots

I

D

B

C

atomic

collectDrc

()

rootsTracing

(I)

applyIncrements

(I)

scanCounting

(D) sweepCounting() rootsTracing(D) applyDecrements(D)

Slide48

ABC110

ABC110

A

C

B

Roots

I

D

atomic

collectDrc

()

rootsTracing

(I)

applyIncrements

(I)

scanCounting

(D)

sweepCounting() rootsTracing(D) applyDecrements(D)

Slide49

Comparing GCs Summary

GCs performance depends on various aspects

- Therefore, no GC has an absolute advantage on the others.

Garbage collection can be expressed in an abstract way.

- Highlights similarity and differences

Slide50

Allocation

Three aspects to memory management:

Allocation of memory in the first place

Identification of live data

Reclamation for future use

Allocation and reclamation of memory are tightly linked

Several key differences between automatic and explicit memory management, in terms of allocating and freeing:

GC free space all at once

A system with GC has more information when allocating

With GC, users tends to write programs in a different style.

Slide51

Uses a large free chunk of memoryGiven a request for n bytes, it allocates that much from one end of the free chunk.sequentialAllocate(n): result ← free newFree ← result + n if newFree > limit return null free ← newFree return result

Sequential Allocation

Slide52

allocated

available

free

limit

Request to allocate n bytes

n

allocated

available

free

limit

allocated

result

Alignment

padding

Slide53

Properties:SimpleEfficientBetter cache localityMay be less suitable for non-moving collectors

Sequential Allocation

Slide54

A data structure records the location and size of free cells of memory.The allocator considers each free cell in turn, and according to some policy, chooses one to allocate.Three basic types of free-list allocation:First-fitNext-fitBest-fit

Free-list Allocation

Slide55

First-fit Allocation

Use the first cell that can satisfy the allocation request.

A split of the cell may occur unless

the remainder

is too

small.

firstFitAllocate

(n

):

prev

adressOf

(head)

loop

curr

← next(

prev

)

if

curr

= null

return

null

else if

size(

curr

) < n

prev

curr

else

return

listAllocate

(

prev

,

curr

, n)

Slide56

listAllocate(prev, curr, n): result ← curr if shouldSplit(size(curr), n) remainder ← result + n next(remainder) ← next(curr) size(remainder) ← size(curr)-n next(prev) ← remainder else next(prev) ← next(curr) return resultliatAllocateAlt(prev, curr, n): if sholudSplit(size(curr), n) size(curr) ← size(curr) – n result ← curr + size(curr) else next(prev) ← next(curr) result ← curr return result

First-fit Allocation

Slide57

150KB

100KB

170KB

300KB

50KB

Allocated

Free

120KB allocation request

30KB

100KB

170KB

300KB

50KB

First-fit

Slide58

30KB

100KB

170KB

300KB

50KB

50KB

allocation request

30KB

50KB

170KB

300KB

50KB

Slide59

30KB

50KB

170KB

300KB

50KB

200KB

allocation request

30KB

50KB

170KB

100KB

50KB

Slide60

Small remainder cells accumulate near the front of the list, slowing down allocation.In terms of space utilization, may behave similarly to best-fit.An issue is where in the list to enter a newly freed cellIt is usually more natural to build the list in address order, like mark-sweep does.

First-fit Allocation

Slide61

A variation of first-fitMethod - start the search for a cell of suitable size from the point in the list where the last search succeeded.When reaching the end of list, start over from the beginning.Idea - reduce the need to iterate repeatedly past the small cells at the head of the list.Drawbacks:FragmentationPoor locality on accessing the listPoor locality of the allocated objects

Next-fit Allocation

Slide62

nextFitAllocate(n): start ← prev loop curr ← next(prev) if curr = null prev ← addressOf(head) curr ← next(prev) if prev = start return null else if size(curr) < n prev ← curr else return listAllocate(prev, curr, n)

Next-fit Allocation Algorithm

Slide63

150KB

100KB

170KB

300KB

50KB

Allocated

Free

120KB allocation request

30KB

100KB

170KB

300KB

50KB

Next-fit

Slide64

30KB

100KB

170KB

300KB

50KB

20KB

allocation request

30KB

80KB

170KB

300KB

50KB

Slide65

30KB

80KB

170KB

300KB

50KB

50KB

allocation request

30KB

80KB

120KB

300KB

50KB

Slide66

Method - find the cell whose size most closely matches the allocation request.Idea:Minimize wasteAvoid splitting large cells unnecessarilyBad worst case

Best-fit Allocation

Slide67

bestFitAllocate(n): best ← null bestSize ← ∞ prev ← addressOf(head) loop curr ← next(prev) if curr = null || size(curr) = n if curr ≠ null bestPrev ← prev best ← curr else if best = null return null return listAllocate(bestPrev, best, n) else if size(curr) < n || bestSize < size(curr) prev ← curr else best ← curr bestPrev ← prev bestSize ← size(curr)

Best-fit Allocation Algorithm

Slide68

150KB

100KB

170KB

300KB

50KB

Allocated

Free

150KB

10KB

170KB

300KB

50KB

90KB allocation request

Best-fit

Slide69

150KB

10KB

170KB

300KB

50KB

50KB allocation request

150KB

10KB

170KB

300KB

Slide70

150KB

10KB

170KB

300KB

50KB

10KB

170KB

300KB

100KB allocation request

Slide71

Use of a Balanced binary treeSorted by size (for best-fit) or by address (for first-fit or next-fit).If sorted by size, can enter only one cell of each size.Example: Cartesian tree for first/next-fit.Indexed by address (primary key) and size (secondary key)Total order by addressOrganized as a heap for the sizes

Speeding Free-list Allocation

Slide72

Searching in the Cartesian tree under first-fits policy:firstFitAllocateCartesian(n): parent ← null curr ← root loop if left(curr) ≠ null && max(left(curr)) ≥ n parent ← curr curr ← left(curr) else if prev < curr && size(curr) ≥ n prev ← curr return treeAllocate(curr, parent, n) else if right(curr) ≠ null && max(right(curr)) ≥ n parent ← curr curr ← right (curr) else return null

Speeding Free-list Allocation

Slide73

Dispersal of free memory across a possibly large number of small free cells.Negative effects:Can prevent allocation from succeedingMay cause a program to use more address space, more resident pages and more cache lines.Fragmentation is impractical to avoid:Usually the allocator cannot know what the future request sequence will be.Even given a known request sequence, doing an optimal allocation is NP-hard.Usually There is a trade-off between allocation speed and fragmentation.

Fragmentation

Slide74

Idea – use multiple free-list whose members are segregated by size in order to speed allocation.Usually a fixed number k of size values s0 < s1 < … < sk-1k+1 free lists f0,…,fkFor a free cell, b, on list fi, size(b) > sk-1 if i=kWhen requesting a cell of size b≤sk-1, the allocator rounds the request size up to the smallest si such that b ≤si.Si is called a size class

 

Segregated-fits Allocation

Slide75

SegregatedFitAllocate(j): result ← remove(freeLists[j]) if result = null large ← allocateBlock() if large = null return null initialize(large, sizes[j]) result ← remove(freeList[j]) return resultList fk, for cells larger than sk, is organized to use one of the basic single-list algorithm.Per-cell overheads for large cell are a bit higher but in total it is negligible. The main advantage: for size classes other than sk, allocation typically requires constant time.

Segregated-fits Allocation

Slide76

f

k-1

f

k

f

1

f

0

s

0

s

1

s

k-1

>s

k-1

>s

k-1

Slide77

On simple free-list allocators – free cells that were too small to satisfy a request. Called external fragmentation.On segregated-fits allocation – wasted space inside an individual cell because the requested size was rounded up. Called internal fragmentation.

More on Fragmentation

Slide78

Important consideration – how to populate each free-list of segregated-fits.Two approaches:Dedicating whole blocks to particular sizesSplitting

Populating size classes

Slide79

Choose some block size B, a power of two.The allocator is provided with blocks.If the request is larger than one block, multiple contiguous blocks are allocated.For a size class s < B, we populate the free-list fs by allocating a block and immediately slice it into cells of size s.Metadata of the cells is stored on the block.

Big Bag of Pages

Block-based allocation

Slide80

Disadvantage:Fragmentation, average waste of half a block (worst case (B-s)/B).Advantages:Reduced per-cell metadataSimple and efficient for the common case

Big Bag of Pages

Block-based allocation

Slide81

Like simple free-list schemes, split a cell if that is the only way to satisfy a request.Improvement: concatenate the remaining portion to a suitable free-list (if possible).For example – the buddy system:Size class are powers of twoCan split a cell of size 2i+1 into two cells of size 2iCan combine in the opposite direction (only if the two small cells were split from the same large cell)

Splitting

Slide82

128KB

Allocated Minimum cell size – 16KB

Free Maximum cell size – 128KB

Allocation request

20KB

The Buddy System

Slide83

64KB

64KB

Allocated Minimum cell size – 16KB

Free Maximum cell size – 64KB

32KB

64KB

32KB

Slide84

Allocation request

10KB

Allocated Minimum cell size – 16KB

Free Maximum cell size – 64KB

12KB

64KB

32KB

20KB

12KB

64KB

16KB

20KB

16KB

Slide85

Free

10KB

12KB

64KB

16KB

20KB

16KB

Allocated Minimum cell size – 16KB

Free Maximum cell size – 64KB

12KB

64KB

20KB

16KB

10KB

6KB

Slide86

Allocated Minimum cell size – 16KB

Free Maximum cell size – 64KB

12KB

64KB

32KB

20KB

Free

20KB

32KB

64KB

32KB

Slide87

64KB

64KB

Allocated Minimum cell size – 16KB

Free Maximum cell size – 64KB

128KB

Slide88

AlignmentSize constraintsBoundary tagsHeap parsabilityLocality

Allocation’s Additional Considerations

Slide89

AlignmentSize constraintsBoundary tagsHeap parsabilityLocality

Allocation’s Additional Considerations

Slide90

Allocated objects may require special alignmentFor example: a double-word floating pointCan make the granule a double-word – wastefulHeader of array in java takes 3 words – one word is wasted or skipped.

Alignment

Slide91

AlignmentSize constraintsBoundary tagsHeap parsabilityLocality

Allocation’s Additional Considerations

Slide92

Some collection schemes require a minimum amount of space in each cell.Forwarding addressLock/statusIn that case, the allocator will allocate more words than requested.

Size Constraints

Slide93

AlignmentSize constraintsBoundary tagsHeap parsabilityLocality

Allocation’s Additional Considerations

Slide94

Additional header or boundary tag associated with each cell.Found outside the storage available to the program.Indicates size and allocated/free statusIs one or two words longPossible use of bitmap instead

Boundary Tags

Slide95

AlignmentSize constraintsBoundary tagsHeap parsabilityLocality

Allocation’s Additional Considerations

Slide96

The ability to advance cell to cell in the heapAn object’s header (one or two words):TypeHash codeSynchronization informationMark bitThe header comes before the dataThe reference refers to the first element/field

Heap

Parsability

Slide97

Slide98

How to handle alignment?Zero all free space in advanceDevise a distinct range of values to write at the start of the gapEasier parsing with a bit map, indicating where each object start.Require additional space and time

Heap

Parsability

Slide99

AlignmentSize constraintsBoundary tagsHeap parsabilityLocality

Allocation’s Additional Considerations

Slide100

During allocatingAddress-ordered free-list and sequential allocation present good locality.During freeingGoal: Objects being freed together will be near each other.Empirically, objects allocated at the same time often become unreachable at about the same time.

Locality

Slide101

Multiple threads allocatingMost steps in allocation need to be atomicCan result a bottleneckBasic solution – each thread has its own allocation area.Use of a global pool and smart chunk handing

Allocation in Concurrent Systems

Slide102

Allocation Summary

Methods:

Sequential

Free-list: First-fit, Next-fit and Best-fit.

Segregated-fits

Various considerations to notice