Richard Jones Antony Hosking and Eliot Moss 2012 Presented by Yarden Marton 181114 Comparing between different garbage collectors Allocation methods and considerations Outline ID: 759762
Download Presentation The PPT/PDF document "Comparing GCs and Allocation" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Comparing GCs and Allocation
Richard Jones, Antony Hosking and Eliot Moss
2012
Presented by
Yarden
Marton
18.11.14
Slide2Comparing between different garbage collectors.Allocation – methods and considerations.
Outline
Slide3Comparing GCs
What is the
best
GC?
When we say “best” do we mean:
Best throughput?
Shortest pause times?
Good space utilization?
Compromised combination?
Slide4Comparing GCs
More to consider:
Application dependency
Heap space availability
Heap size
Slide5Slide6ThroughputPause timeSpaceImplementation
Comparing GCs - Aspects
Slide7ThroughputPause timeSpaceImplementation
Comparing GCs - Aspects
Slide8Primary goal for ‘batch’ applications or for systems experiencing delays.Does a faster collector means faster application? Not necessarily.Mutators pay the cost
Throughput
Slide9Throughput
Algorithmic complexity
Mark-sweep:
cost of tracing and sweeping phases.
Requires visiting every object
Copying:
cost of tracing phase only
Requires visiting only live objects
Slide10Throughput
Is Copying collection faster?
Not necessarily:
Number of instructions executed to visit an object
Locality
Lazy sweeping
Slide11ThroughputPause timeSpaceImplementation
Comparing GCs - Aspects
Slide12Pause Time
Important for interactive applications, transaction processors and more
.
‘stop-the-world’ collectors
Immediate attraction to reference counting
However:
Recursive reference count is costly
Both improvements of reference count reintroduce a stop-the-world pause
Slide13ThroughputPause timeSpaceImplementation
Comparing GCs - Aspects
Slide14Space
Important for:
T
ight
physical constraints on
memory
Large applications
All collectors incur space overhead:
Reference count fields
Additional heap space
Heap fragmentation
Auxiliary data structures
Room for garbage
Slide15Space
Completeness
– reclaiming
all dead objects
eventually.
Basic reference counting is incomplete
Promptness
– reclaiming all dead objects at each collection cycle.
-
Basic
tracing
collectors (but with a cost)
Modern
high-performances collectors typically trade immediacy for
performance.
Slide16ThroughputPause timeSpaceImplementation
Comparing GCs - Aspects
Slide17Implementation
GC algorithms are difficult to implement, especially concurrent algorithms.
Errors can manifest themselves long afterwards
Tracing:
Advantage
: Simple collector-
mutator
interface
Disadvantage
: Determining roots is complicated
Reference counting:
Advantage
:
Can be implemented in a library
Disadvantage
:
Processing overheads and correctness essentiality of all reference count manipulation.
In general, copying and compacting collectors are more complex than non-moving collectors.
Slide18Adaptive Systems
Commercial system often offer a choice between GCs, with a large number of tuning options.
Researchers have developed systems that adapts to the
enviroment
:
Java run-time (
Soman
et al [2004])
Singer et al [2007a]
Sun’s Ergonomic tuning
Slide19Advice For Developers
Know your application:
- Measure its behavior
- Track the size and lifetime distributions of the objects it uses.
Experiment with the different collector configurations on offer.
Slide20Considered two styles of collection:Direct, reference counting.Indirect, tracing collection.Next: An abstract framework for a wide variety of collectors.
A Unified Theory of GC
Slide21GC can be expressed as a fixed point computation that assigns reference counts (n) to nodes n Nodes.Nodes with non-zero count are retained and the rest should be reclaimed.Use of abstract data structures whose implementations can vary.W – a work list of objects to be processed. When empty, the algorithms terminate.
Abstract GC
Slide22atomic collectTracing(): rootsTracing(W) //find root objects scanTracing(W) //mark reachable objects sweepTracing() //free dead objectsrootsTracing(R): for each fld in Roots ref ← *fld if ref ≠ null R ← R + [ref]scanTracing(W): while not isEmpty(W) src ← remove(W) (src) ← (src)+1 if (src) = 1 for each fld in Pointers(src) ref ← *fld if ref ≠ null W ← W + [ref]
Abstract Tracing GC Algorithm
Slide23sweepTracing(): for each noed in Nodes if (node) = 0 free(node) else (node) ← 0New(): ref ← allocate() if ref = null collectTracing() ref ← allocate() if ref = null error “Out of memory” (ref) ← 0 return ref
Abstract Tracing GC Algorithm (Continued)
Slide24A
D
C
B
Roots
A
B
C
D
0
0
0
0
A
B
CD0000
W
Slide25A
D
C
B
Roots
A
B
C
D
0
0
0
0
A
B
CD0000
WBC
Slide26A
D
C
B
Roots
A
B
C
D
0
1
0
0
A
B
CD0100
WC
Slide27A
D
C
B
Roots
A
B
C
D
0
1
1
0
A
B
CD0110
WAB
Slide28A
D
C
B
Roots
A
B
C
D
1
1
1
0
A
B
CD1110
WB
Slide29A
D
C
B
Roots
A
B
C
D
1
2
1
0
A
B
CD1210
W
Slide30A
D
C
B
Roots
A
B
C
D
0
0
0
0
A
B
CD0000
W
Slide31atomic collectCounting(I,D): applyIncrements(I) //increase necessary scanCounting(D) //decrease reqursivaly sweepCounting() //free dead objectsapplyIncrements(I): while not isEmpty(I) ref ← remove(I) (ref) ← (ref)+1 scanCounting(W): while not isEmpty(W) src ← remove(W) (src) ← (src)-1 if (src) = 0 for each fld in Pointers(src) ref ← *fld if ref ≠ null W ← W + [ref]
Abstract reference counting GC Algorithm
Slide32sweepCounting(): for each node in Nodes if (node) = 0 free(node)New(): ref ← allocate() if ref = null collectCounting() ref ← allocate() if ref = null error “Out of memory” (ref) ← 0 return ref
Abstract reference counting GC
Algorithm (Continued)
Slide33inc(ref): if ref ≠ null I ← I + [ref] dec(ref): if ref ≠ null D ← D + [ref]Atomic Write(src, i, dst): inc(dst) dec(src[i]) src[i] ← dst
Abstract reference counting GC
Algorithm (Continued)
Slide34ABCD0000
ABCD0000
A
D
C
B
Roots
I
A
B
A
D
B
C
B
D
A
D
atomic
collectCounting
()
applyIncrements
(I)
Slide35ABCD1000
ABCD1000
A
D
C
B
Roots
atomic
collectCounting
()
applyIncrements
(I)
I
B
A
D
B
C
B
D
A
D
Slide36ABCD2311
ABCD2311
A
D
C
B
Roots
atomic
collectCounting
()
applyIncrements
(I)
I
D
A
D
Slide37ABCD1310
ABCD1310
A
D
C
B
Roots
atomic
collectCounting
()
applyIncrements
(I)
scanCounting
(D
)
I
D
B
Slide38ABCD1210
ABCD1210
A
D
C
B
Roots
atomic
collectCounting
()
applyIncrements
(I)
scanCounting
(D
)
I
D
Slide39ABCD1210
ABCD1210
A
D
C
B
Roots
atomic
collectCounting
()
applyIncrements
(I)
scanCounting
(D)
sweepCounting
()
I
D
Slide40Atomic collecDrc(I,D): rootsTracing(I) //add root objects to I applyIncrements(I) //increase necessary scanCounting(D) //decrease reqursively sweepCounting() //free dead objects rootsTracing(D) //keep invariant applyDecrements(D)New(): ref ← allocate() if ref = null collecDrc(I,D) ref ← allocate() if ref = null error “Out of memory” (ref) ← 0 return ref
Abstract deferred
reference counting GC
Algorithm
Slide41Atomic Write(src, i, dst): if src ≠ Roots inc(dst) dec(src[i]) src[i] ← dstapplyDecrements(D): while not isEmpty(D) ref ← remove(D) (ref) ← (ref)-1
Abstract deferred
reference counting GC
Algorithm (Continued)
Slide42ABCD0000
ABCD0000
A
D
C
B
Roots
I
A
B
A
D
B
D
A
D
atomic
collectDrc
()
rootsTracing
(I)
Slide43ABCD0000
ABCD0000
A
D
C
B
Roots
I
A
B
A
D
B
B
C
D
A
D
atomic
collectDrc
()
rootsTracing
(I) applyIncrements(I)
Slide44ABCD2311
ABCD2311
A
D
C
B
Roots
I
D
A
D
atomic
collectDrc
()
rootsTracing
(I)
applyIncrements
(I)
scanCounting
(D)
Slide45ABCD1210
ABCD1210
A
D
C
B
Roots
I
D
atomic
collectDrc
()
rootsTracing
(I)
applyIncrements
(I)
scanCounting
(D)
sweepCounting
()
Slide46ABCD1210
ABCD1210
A
D
C
B
Roots
I
D
atomic
collectDrc
()
rootsTracing
(I)
applyIncrements
(I)
scanCounting
(D)
sweepCounting
() rootsTracing(D)
Slide47ABC121
ABC121
A
C
B
Roots
I
D
B
C
atomic
collectDrc
()
rootsTracing
(I)
applyIncrements
(I)
scanCounting
(D) sweepCounting() rootsTracing(D) applyDecrements(D)
Slide48ABC110
ABC110
A
C
B
Roots
I
D
atomic
collectDrc
()
rootsTracing
(I)
applyIncrements
(I)
scanCounting
(D)
sweepCounting() rootsTracing(D) applyDecrements(D)
Slide49Comparing GCs Summary
GCs performance depends on various aspects
- Therefore, no GC has an absolute advantage on the others.
Garbage collection can be expressed in an abstract way.
- Highlights similarity and differences
Slide50Allocation
Three aspects to memory management:
Allocation of memory in the first place
Identification of live data
Reclamation for future use
Allocation and reclamation of memory are tightly linked
Several key differences between automatic and explicit memory management, in terms of allocating and freeing:
GC free space all at once
A system with GC has more information when allocating
With GC, users tends to write programs in a different style.
Slide51Uses a large free chunk of memoryGiven a request for n bytes, it allocates that much from one end of the free chunk.sequentialAllocate(n): result ← free newFree ← result + n if newFree > limit return null free ← newFree return result
Sequential Allocation
Slide52allocated
available
free
limit
Request to allocate n bytes
n
allocated
available
free
limit
allocated
result
Alignment
padding
Slide53Properties:SimpleEfficientBetter cache localityMay be less suitable for non-moving collectors
Sequential Allocation
Slide54A data structure records the location and size of free cells of memory.The allocator considers each free cell in turn, and according to some policy, chooses one to allocate.Three basic types of free-list allocation:First-fitNext-fitBest-fit
Free-list Allocation
Slide55First-fit Allocation
Use the first cell that can satisfy the allocation request.
A split of the cell may occur unless
the remainder
is too
small.
firstFitAllocate
(n
):
prev
←
adressOf
(head)
loop
curr
← next(
prev
)
if
curr
= null
return
null
else if
size(
curr
) < n
prev
←
curr
else
return
listAllocate
(
prev
,
curr
, n)
Slide56listAllocate(prev, curr, n): result ← curr if shouldSplit(size(curr), n) remainder ← result + n next(remainder) ← next(curr) size(remainder) ← size(curr)-n next(prev) ← remainder else next(prev) ← next(curr) return resultliatAllocateAlt(prev, curr, n): if sholudSplit(size(curr), n) size(curr) ← size(curr) – n result ← curr + size(curr) else next(prev) ← next(curr) result ← curr return result
First-fit Allocation
Slide57150KB
100KB
170KB
300KB
50KB
Allocated
Free
120KB allocation request
30KB
100KB
170KB
300KB
50KB
First-fit
Slide5830KB
100KB
170KB
300KB
50KB
50KB
allocation request
30KB
50KB
170KB
300KB
50KB
Slide5930KB
50KB
170KB
300KB
50KB
200KB
allocation request
30KB
50KB
170KB
100KB
50KB
Slide60Small remainder cells accumulate near the front of the list, slowing down allocation.In terms of space utilization, may behave similarly to best-fit.An issue is where in the list to enter a newly freed cellIt is usually more natural to build the list in address order, like mark-sweep does.
First-fit Allocation
Slide61A variation of first-fitMethod - start the search for a cell of suitable size from the point in the list where the last search succeeded.When reaching the end of list, start over from the beginning.Idea - reduce the need to iterate repeatedly past the small cells at the head of the list.Drawbacks:FragmentationPoor locality on accessing the listPoor locality of the allocated objects
Next-fit Allocation
Slide62nextFitAllocate(n): start ← prev loop curr ← next(prev) if curr = null prev ← addressOf(head) curr ← next(prev) if prev = start return null else if size(curr) < n prev ← curr else return listAllocate(prev, curr, n)
Next-fit Allocation Algorithm
Slide63150KB
100KB
170KB
300KB
50KB
Allocated
Free
120KB allocation request
30KB
100KB
170KB
300KB
50KB
Next-fit
Slide6430KB
100KB
170KB
300KB
50KB
20KB
allocation request
30KB
80KB
170KB
300KB
50KB
Slide6530KB
80KB
170KB
300KB
50KB
50KB
allocation request
30KB
80KB
120KB
300KB
50KB
Slide66Method - find the cell whose size most closely matches the allocation request.Idea:Minimize wasteAvoid splitting large cells unnecessarilyBad worst case
Best-fit Allocation
Slide67bestFitAllocate(n): best ← null bestSize ← ∞ prev ← addressOf(head) loop curr ← next(prev) if curr = null || size(curr) = n if curr ≠ null bestPrev ← prev best ← curr else if best = null return null return listAllocate(bestPrev, best, n) else if size(curr) < n || bestSize < size(curr) prev ← curr else best ← curr bestPrev ← prev bestSize ← size(curr)
Best-fit Allocation Algorithm
Slide68150KB
100KB
170KB
300KB
50KB
Allocated
Free
150KB
10KB
170KB
300KB
50KB
90KB allocation request
Best-fit
Slide69150KB
10KB
170KB
300KB
50KB
50KB allocation request
150KB
10KB
170KB
300KB
Slide70150KB
10KB
170KB
300KB
50KB
10KB
170KB
300KB
100KB allocation request
Slide71Use of a Balanced binary treeSorted by size (for best-fit) or by address (for first-fit or next-fit).If sorted by size, can enter only one cell of each size.Example: Cartesian tree for first/next-fit.Indexed by address (primary key) and size (secondary key)Total order by addressOrganized as a heap for the sizes
Speeding Free-list Allocation
Slide72Searching in the Cartesian tree under first-fits policy:firstFitAllocateCartesian(n): parent ← null curr ← root loop if left(curr) ≠ null && max(left(curr)) ≥ n parent ← curr curr ← left(curr) else if prev < curr && size(curr) ≥ n prev ← curr return treeAllocate(curr, parent, n) else if right(curr) ≠ null && max(right(curr)) ≥ n parent ← curr curr ← right (curr) else return null
Speeding Free-list Allocation
Slide73Dispersal of free memory across a possibly large number of small free cells.Negative effects:Can prevent allocation from succeedingMay cause a program to use more address space, more resident pages and more cache lines.Fragmentation is impractical to avoid:Usually the allocator cannot know what the future request sequence will be.Even given a known request sequence, doing an optimal allocation is NP-hard.Usually There is a trade-off between allocation speed and fragmentation.
Fragmentation
Slide74Idea – use multiple free-list whose members are segregated by size in order to speed allocation.Usually a fixed number k of size values s0 < s1 < … < sk-1k+1 free lists f0,…,fkFor a free cell, b, on list fi, size(b) > sk-1 if i=kWhen requesting a cell of size b≤sk-1, the allocator rounds the request size up to the smallest si such that b ≤si.Si is called a size class
Segregated-fits Allocation
Slide75SegregatedFitAllocate(j): result ← remove(freeLists[j]) if result = null large ← allocateBlock() if large = null return null initialize(large, sizes[j]) result ← remove(freeList[j]) return resultList fk, for cells larger than sk, is organized to use one of the basic single-list algorithm.Per-cell overheads for large cell are a bit higher but in total it is negligible. The main advantage: for size classes other than sk, allocation typically requires constant time.
Segregated-fits Allocation
Slide76f
k-1
f
k
f
1
f
0
s
0
s
1
s
k-1
>s
k-1
>s
k-1
Slide77On simple free-list allocators – free cells that were too small to satisfy a request. Called external fragmentation.On segregated-fits allocation – wasted space inside an individual cell because the requested size was rounded up. Called internal fragmentation.
More on Fragmentation
Slide78Important consideration – how to populate each free-list of segregated-fits.Two approaches:Dedicating whole blocks to particular sizesSplitting
Populating size classes
Slide79Choose some block size B, a power of two.The allocator is provided with blocks.If the request is larger than one block, multiple contiguous blocks are allocated.For a size class s < B, we populate the free-list fs by allocating a block and immediately slice it into cells of size s.Metadata of the cells is stored on the block.
Big Bag of Pages
Block-based allocation
Slide80Disadvantage:Fragmentation, average waste of half a block (worst case (B-s)/B).Advantages:Reduced per-cell metadataSimple and efficient for the common case
Big Bag of Pages
Block-based allocation
Slide81Like simple free-list schemes, split a cell if that is the only way to satisfy a request.Improvement: concatenate the remaining portion to a suitable free-list (if possible).For example – the buddy system:Size class are powers of twoCan split a cell of size 2i+1 into two cells of size 2iCan combine in the opposite direction (only if the two small cells were split from the same large cell)
Splitting
Slide82128KB
Allocated Minimum cell size – 16KB
Free Maximum cell size – 128KB
Allocation request
20KB
The Buddy System
Slide8364KB
64KB
Allocated Minimum cell size – 16KB
Free Maximum cell size – 64KB
32KB
64KB
32KB
Slide84Allocation request
10KB
Allocated Minimum cell size – 16KB
Free Maximum cell size – 64KB
12KB
64KB
32KB
20KB
12KB
64KB
16KB
20KB
16KB
Slide85Free
10KB
12KB
64KB
16KB
20KB
16KB
Allocated Minimum cell size – 16KB
Free Maximum cell size – 64KB
12KB
64KB
20KB
16KB
10KB
6KB
Slide86Allocated Minimum cell size – 16KB
Free Maximum cell size – 64KB
12KB
64KB
32KB
20KB
Free
20KB
32KB
64KB
32KB
Slide8764KB
64KB
Allocated Minimum cell size – 16KB
Free Maximum cell size – 64KB
128KB
Slide88AlignmentSize constraintsBoundary tagsHeap parsabilityLocality
Allocation’s Additional Considerations
Slide89AlignmentSize constraintsBoundary tagsHeap parsabilityLocality
Allocation’s Additional Considerations
Slide90Allocated objects may require special alignmentFor example: a double-word floating pointCan make the granule a double-word – wastefulHeader of array in java takes 3 words – one word is wasted or skipped.
Alignment
Slide91AlignmentSize constraintsBoundary tagsHeap parsabilityLocality
Allocation’s Additional Considerations
Slide92Some collection schemes require a minimum amount of space in each cell.Forwarding addressLock/statusIn that case, the allocator will allocate more words than requested.
Size Constraints
Slide93AlignmentSize constraintsBoundary tagsHeap parsabilityLocality
Allocation’s Additional Considerations
Slide94Additional header or boundary tag associated with each cell.Found outside the storage available to the program.Indicates size and allocated/free statusIs one or two words longPossible use of bitmap instead
Boundary Tags
Slide95AlignmentSize constraintsBoundary tagsHeap parsabilityLocality
Allocation’s Additional Considerations
Slide96The ability to advance cell to cell in the heapAn object’s header (one or two words):TypeHash codeSynchronization informationMark bitThe header comes before the dataThe reference refers to the first element/field
Heap
Parsability
Slide97Slide98How to handle alignment?Zero all free space in advanceDevise a distinct range of values to write at the start of the gapEasier parsing with a bit map, indicating where each object start.Require additional space and time
Heap
Parsability
Slide99AlignmentSize constraintsBoundary tagsHeap parsabilityLocality
Allocation’s Additional Considerations
Slide100During allocatingAddress-ordered free-list and sequential allocation present good locality.During freeingGoal: Objects being freed together will be near each other.Empirically, objects allocated at the same time often become unreachable at about the same time.
Locality
Slide101Multiple threads allocatingMost steps in allocation need to be atomicCan result a bottleneckBasic solution – each thread has its own allocation area.Use of a global pool and smart chunk handing
Allocation in Concurrent Systems
Slide102Allocation Summary
Methods:
Sequential
Free-list: First-fit, Next-fit and Best-fit.
Segregated-fits
Various considerations to notice