Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy - PowerPoint Presentation

342 views
Uploaded On 2022-02-24

Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy - PPT Presentation

Snehasish Kumar Hongzhou Zhao Arrvindh Shriraman Eric Matthews Sandhya Dwarkadas Lesley Shannon School of Computing Sciences Simon Fraser University Department of Computer Science University of Rochester ID: 909735

block cache amoeba words cache block words amoeba spatial blocks unused bandwidth space results granularity work data tags prior

Link:

Copy

Embed:

<iframe width="560" height="315" src="https://www.docslides.com/embed/909735" frameborder="0" allowfullscreen></iframe>

Download Presentation from below link

Download Presentation The PPT/PDF document "Amoeba-Cache: Adaptive Blocks for Elimin..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation Transcript

Slide1

Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy

Snehasish

Kumar,

Hongzhou

Zhao†,

Arrvindh

Shriraman

Eric Matthews∗, Sandhya

Dwarkadas

†, Lesley Shannon∗

School of Computing Sciences, Simon Fraser University

†Department of Computer Science, University of Rochester

∗School of Engineering Science, Simon Fraser University

2012 IEEE/ACM 45th Annual International Symposium on Microarchitecture

Slide2

Overview

Problem being addressed

Prior work

Challenges encountered

Proposed solution

Results

Review of the paper

Slide3

Problem Statement

Current caches are designed for fixed block size

Block granularity is decided based on average spatial locality across general workload

Many common applications exhibit fairly lower spatial localities than the design point

Unused words occupy between 17—80% of a 64K L1 cache and between 1%—79% of a 1MB private LLC

Unused-word transfers comprise 11% of on-chip cache hierarchy energy consumption.

Slide4

Prior Work

Sector Cache:

Aims at minimizing bandwidth

Fetches sub blocks

Works well for applications with low-moderate spatial locality

Reduces mispredicted spatial prefetches thus reducing bandwidth usage and energy consumption

Slide5

Prior Work

Slide6

Challenges to static design

Small cache lines tend to fetch fewer unused words

Impose significant performance penalties by missing opportunities for spatial prefetching in applications with high spatial locality

Larger cache line sizes effectively

prefetch

neighboring words

Increase number of unused words and network bandwidth.

Determining a fixed optimal point for the cache line granularity at

hardware design time is a challenge

Slide7

Amoeba Cache

A novel cache architecture that supports fine grain (per-miss) dynamic adjustment of cache block size and the # of blocks per set.

Filters out unused words in a block and prevents them

from being inserted into the cache, allowing the resulting free space

to hold other useful blocks.

Can adapt to the available spatial locality

Slide8

Amoeba Cache

Slide9

Amoeba Cache

How to grow and shrink the # of tags as the # of blocks per set vary with block granularity?

Eliminates Tag array

Tags and Data are kept together in a single data array

Bitmaps indicate which words in the data array are tags

Valid bits are also stored as bitmap

Slide10

Amoeba Cache

Slide11

Amoeba Cache

Data lookup:

Tag bitmap activates words from the array containing tags for comparison

Minimum size of Amoeba block is 2 words(1 tag 1 data), adjacent words cannot be tags

Slide12

Amoeba Cache

Block Insertion:

The Valid bitmap is used to determine empty slots within the set

1 means allocated, 0 means empty

For an incoming block with

words,

consecutive 0’s are searched

The replacement algorithm is triggered repeatedly until space is created

To reclaim space from Amoeba block, Tag bits and Valid bits from the

bitset

corresponding to the block are unset.

Uses LRU policy to choose a way within the cache and randomly picks a random candidate from within the set for block replacement

Slide13

Partial Misses:

Low probability (5 in 1K accesses)

Identify overlapping blocks

Evict to MSHR

Allocate space for entire block

Miss request

Block copied

Slide14

Results

Result 1:

Amoeba-Cache increases cache capacity by harvesting space from unused words and can achieve an 18% reduction in both L1 and L2 miss rate.

Result 2:

Amoeba-Cache adaptively sizes the cache block granularity and reduces L1

↔

L2 bandwidth by 46% and L2

↔

Memory bandwidth

by 38%.

Slide15

Results

Slide16

Results

Result 3: Boosts performance by 10% on commercial applications saving 11% energy of on-chip memory hierarchy. Off-chip L2 to MM sees a mean energy reduction of 41% across all workloads.

Slide17

Review of the paper

Connects proposed work with prior work

Builds up on the proposed idea gradually with sufficient examples

Algorithms explained well with control flow diagrams

Lots of comparative graphs to support the results

The maximum region size(RMAX) stated differently in text(bytes) and in diagram(words).

The reason for using of the metric 1/(

MissRate

Bandwidth

) for determining block granularity could have been supported better.

Slide18