/
The Memory Hierarchy Cache, Main Memory, and Virtual Memory The Memory Hierarchy Cache, Main Memory, and Virtual Memory

The Memory Hierarchy Cache, Main Memory, and Virtual Memory - PowerPoint Presentation

pasty-toler
pasty-toler . @pasty-toler
Follow
432 views
Uploaded On 2018-11-04

The Memory Hierarchy Cache, Main Memory, and Virtual Memory - PPT Presentation

Lecture for CPSC 5155 Edward Bosworth PhD Computer Science Department Columbus State University The Simple View of Memory The simplest view of memory is that presented at the ISA Instruction Set Architecture level At this level memory is a ID: 713674

memory cache block mem cache memory mem block address associative line hierarchy hit large fast exploiting chapter set tag

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "The Memory Hierarchy Cache, Main Memory,..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

The Memory HierarchyCache, Main Memory, and Virtual Memory

Lecture for CPSC 5155

Edward Bosworth, Ph.D.

Computer Science Department

Columbus State UniversitySlide2

The Simple View of MemoryThe simplest view of memory is that presented

at the ISA (Instruction Set Architecture) level. At this level, memory is a

monolithic addressable unit. This view suffices for all programming uses.Slide3

The Multi-Level View of MemoryReal memory has at least three levels, each of which can be elaborated further.

The fact that most cache memories are now multi-level does not change the basic design issues.Slide4

The More Realistic ViewSlide5

Generic Primary / Secondary MemoryIn each case, we have a fast primary memory backed by a bigger secondary memory.

 

The

“actors” in the two cases are as follows:Technology Primary

Secondary Block

Cache Memory SRAM Cache DRAM

Main

Cache

Line

Memory

Virtual Memory DRAM Main

Disk Page

Memory Memory

Access Time T

P

T

S

(Primary (Secondary)

Time) Time)Slide6

Effective Access TimeEffective Access Time:

T

E

= h

T

P

+ (1 –

h

)

T

S

,

where

h

(the primary hit rate) is the fraction of memory accesses satisfied by the primary memory; 0.0

h

1.0

.

This can be extended to multi-level caches and mixed memory with cache and virtual memory.Slide7

Examples: Cache MemorySuppose a single cache fronting a main memory, which has 80 nanosecond access time.

Suppose the cache memory has access time 10 nanoseconds.

If the hit rate is 90%, then

T

E

= 0.9

10.0 + (1 – 0.9)

80.0

=

0.9

10.0 + 0.1

80.0 = 9.0 + 8.0 = 17.0

ns.

If the hit rate is 99%, then

T

E

= 0.99

10.0 + (1 – 0.99)

80.0

=

0.99

10.0 + 0.01

80.0 = 9.9 + 0.8 = 10.7

ns.Slide8

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 8

Memory Technology

Static RAM (SRAM)

0.5ns – 2.5ns, $2000 – $5000 per GB

Dynamic RAM (DRAM)

50ns – 70ns, $20 – $75 per GB

Magnetic disk

5ms – 20ms, $0.20 – $2 per GB

Ideal memory

Access time of SRAM

Capacity and cost/GB of disk

§5.1 IntroductionSlide9

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 9

Principle of Locality

Programs access a small proportion of their address space at any time

Temporal locality

Items accessed recently are likely to be accessed again soon

e.g., instructions in a loop, induction variables

Spatial locality

Items near those accessed recently are likely to be accessed soon

E.g., sequential instruction access, array dataSlide10

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 10

Taking Advantage of Locality

Memory hierarchy

Store everything on disk

Copy recently accessed (and nearby) items from disk to smaller DRAM memory

Main memory

Copy more recently accessed (and nearby) items from DRAM to smaller SRAM memory

Cache memory attached to CPUSlide11

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 11

Memory Hierarchy Levels

Block (aka line): unit of copying

May be multiple words

If accessed data is present in upper level

Hit: access satisfied by upper level

Hit ratio: hits/accesses

If accessed data is absent

Miss: block copied from lower level

Time taken: miss penalty

Miss ratio: misses/accesses

= 1 – hit ratio

Then accessed data supplied from upper levelSlide12

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 12

Cache Memory

Cache memory

The level of the memory hierarchy closest to the CPU

Given accesses X

1

, …, X

n–1

, X

n

§5.2 The Basics of Caches

How do we know if the data is present?

Where do we look?Slide13

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 13

Direct Mapped Cache

Location determined by address

Direct mapped: only one choice

(Block address) modulo (#Blocks in cache)

#Blocks is a power of 2

Use low-order address bitsSlide14

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 14

Tags and Valid Bits

How do we know which particular block is stored in a cache location?

Store block address as well as the data

Actually, only need the high-order bits

Called the tag

What if there is no data in a location?

Valid bit: 1 = present, 0 = not present

Initially 0Slide15

The Dirty BitIn some contexts, it is important to mark the primary memory if the data have been changed since being copied from the secondary memory.

For historical reasons, this bit is called the

“dirty bit”

, denoted D.

If D = 0, the block does not need to be written back to secondary memory prior to being replaced. This is an efficiency consideration.Slide16

The Cache Line TagThe primary and secondary memories are divided into equally sized blocks.

Suppose a cache line size M = 2

m

bytes. This would be the size of a primary memory block.In an

n

-bit address, the lower

m

bits would be the offset in the block and (

n

m

) bits would identify the block.

The upper

(

n

m

) bits

are the block address.Slide17

The Direct Mapped CacheSuppose that the direct mapped cache has

K = 2

k

cache lines.The full memory address can be divided as:

The lower

k

bits of the block address always determine the cache line.

For this reason, these bits are not part of the tag.

Bits

n

k

m

k

m

Cache View

Tag

Line

Offset

Address View

Block

Address

OffsetSlide18

Simple Example from the Text

In this next example, each cache line holds only one entry. There are 8 = 2

3

cache lines.

Consider 5 bit addresses.

n = 5, k = 3, m = 0 (1 = 2

0

).

The tag thus has 2 bits.

NOTE: The number of entries in a cache line and also the number of lines in the cache must both be a power of 2. Otherwise, the addressing is impossible.Slide19

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 19

Cache Example

8-blocks, 1 word/block, direct mapped

Initial state

Index

V

Tag

Data

000

N

001

N

010

N

011

N

100

N

101

N

110

N

111

NSlide20

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 20

Cache Example

Index

V

Tag

Data

000

N

001

N

010

N

011

N

100

N

101

N

110

Y

10

Mem[10110]

111

N

Word addr

Binary addr

Hit/miss

Cache block

22

10 110

Miss

110Slide21

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 21

Cache Example

Index

V

Tag

Data

000

N

001

N

010

Y

11

Mem[11010]

011

N

100

N

101

N

110

Y

10

Mem[10110]

111

N

Word addr

Binary addr

Hit/miss

Cache block

26

11 010

Miss

010Slide22

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 22

Cache Example

Index

V

Tag

Data

000

N

001

N

010

Y

11

Mem[11010]

011

N

100

N

101

N

110

Y

10

Mem[10110]

111

N

Word addr

Binary addr

Hit/miss

Cache block

22

10 110

Hit

110

26

11 010

Hit

010Slide23

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 23

Cache Example

Index

V

Tag

Data

000

Y

10

Mem[10000]

001

N

010

Y

11

Mem[11010]

011

Y

00

Mem[00011]

100

N

101

N

110

Y

10

Mem[10110]

111

N

Word addr

Binary addr

Hit/miss

Cache block

16

10 000

Miss

000

3

00 011

Miss

011

16

10 000

Hit

000Slide24

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 24

Cache Example

Index

V

Tag

Data

000

Y

10

Mem[10000]

001

N

010

Y

10

Mem[10010]

011

Y

00

Mem[00011]

100

N

101

N

110

Y

10

Mem[10110]

111

N

Word addr

Binary addr

Hit/miss

Cache block

18

10 010

Miss

010Slide25

Another Direct-Mapped Cache

Assume a byte-addressable memory with

32-bit addresses.

Assume 256 cache lines. As 256 = 2

8

, 8 bits of the address are used to select the cache line.

Assume each cache line holds 16 bytes. As

16 = 2

4

, 4 bits of the address are used to specify the offset within the cache line.

The main memory is divided into blocks of size 16 bytes, each the size of a cache line.Slide26

Fields in the Memory Address

Divide the

32–bit

address into three fields: a 20–bit

explicit tag, an 8–bit line number, and a 4–bit offset within the cache line

.

Consider

the address

0x00AB7129

. It would have

Tag

=

0x00AB7

Line =

0x12 Block Address = 0x00AB12

Offset

= 0x9

Bits

31

– 12

11 – 4

3 – 0

Cache View

Tag

Line

Offset

Address View

Block

Address

OffsetSlide27

Associative CachesIn a direct-mapped cache, each memory block from the main memory can be mapped into exactly one location in the cache.

Other cache organizations allow some flexibility in memory block placement.

One option for flexible placement is called an

associative cache

, based on

content addressable memory

. Slide28

Associative MemoryIn associative memory, the contents of the memory are searched in one memory cycle.

Consider an array of 256 entries, indexed from 0 to 255 (or 0x0 to 0xFF

).

Standard search strategies require either 128 tries (unordered) or 8 tries (binary search).

In content addressable memory, only one search is required.Slide29

Associative Search

Associative memory

would find the item in one search. Think of the control circuitry as “broadcasting” the data

value to all memory cells at the same time. If one of the memory cells has the value, it raises a Boolean flag and the item is found

.

Some associative memories allow duplicate values and resolve multiple matches. Cache designs do not allow duplicate values.Slide30

The Associative MatchThis shows a single word in a 4-bit content addressable memory and the circuit to generate the match signal (asserted low).Slide31

The Associative CacheAgain, a 32-bit address with 16-byte cache lines (4 bits for offset in cache).

The number of cache lines is not important for address handling in associative caches.

The address will divide as follows:

28 bits for the cache tag, and

4 bits for the offset in the cache line.

The cache tags will be stored in associative memory connected to the cache..Slide32

The Associative Cache LineA cache line in this arrangement would have the following format

for our sample address.

Here we assume that the CPU has not written to the cache line, so the dirty bit is D = 0.

D bit

V Bit

Tag

16 indexed entries

0

1

0x00AB712

M[0xAB7120] … M[0xAB712F]Slide33

Set Associative CacheAn N–way set–associative cache

uses direct mapping, but allows a set of N memory blocks to be stored in the line. This allows some of the flexibility of a fully associative cache, without the complexity of a large associative memory for searching the cache.

Suppose a 4-way set-associative cache with

16 bytes per memory block. Each cache line has 4 sets of 16, or 64 bytes.Slide34

Sample 2-way Set AssociativeConsider addresses 0xCD4128 and 0xAB7129. Each would be stored in cache

line 0x12

. Set 0 of this cache line would have one block, and set 1 would have the other

.

Set

0

Set

1

D

V

Tag

Contents

D

V

Tag

Contents

1

1

0xCD4

M[0xCD4120] to

M[0xCD412F]

0

1

0xAB7

M[0xAB7120] to M[0xAB712F]Slide35

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy —

35

Associative Caches

Fully associative

Allow a given block to go in any cache entry

Requires all entries to be searched at once

Comparator per entry (expensive)

n

-way set associative

Each set contains

n

entries

Block number determines which set

(Block number) modulo (#Sets in cache)

Search all entries in a given set at once

n

comparators (less expensive)Slide36

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy —

36

Spectrum of Associativity

For a cache with 8 entriesSlide37

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy —

37

Associativity Example

Compare 4-block caches

Direct mapped, 2-way set associative,

fully associative

Block access sequence: 0, 8, 0, 6, 8

Direct mapped

Block address

Cache index

Hit/miss

Cache content after access

0

1

2

3

0

0

miss

Mem[0]

8

0

miss

Mem[8]

0

0

miss

Mem[0]

6

2

miss

Mem[0]

Mem[6]

8

0

miss

Mem[8]

Mem[6]Slide38

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy —

38

Associativity Example

2-way set associative

Block address

Cache index

Hit/miss

Cache content after access

Set 0

Set 1

0

0

miss

Mem[0]

8

0

miss

Mem[0]

Mem[8]

0

0

hit

Mem[0]

Mem[8]

6

0

miss

Mem[0]

Mem[6]

8

0

miss

Mem[8]

Mem[6]

Fully associative

Block address

Hit/miss

Cache content after access

0

miss

Mem[0]

8

miss

Mem[0]

Mem[8]

0

hit

Mem[0]

Mem[8]

6

miss

Mem[0]

Mem[8]

Mem[6]

8

hit

Mem[0]

Mem[8]

Mem[6]Slide39

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy —

39

How Much Associativity

Increased associativity decreases miss rate

But with diminishing returns

Simulation of a system with 64KB

D-cache, 16-word blocks, SPEC2000

1-way: 10.3%

2-way: 8.6%

4-way: 8.3%

8-way: 8.1%Slide40

Chapter 5 — Large and Fast: Exploiting Memory Hierarchy —

40

Set Associative Cache Organization