Prof Eric Rotenberg 1 Fall 2018 ECE 463563 Microprocessor Architecture Prof Eric Rotenberg Virtual Memory Every program has its own virtual memory Large virtual address space Divided into ID: 784033
Download The PPT/PDF document "ECE 463/563 Fall `18 Virtual Memory" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
ECE 463/563Fall `18
Virtual MemoryProf. Eric Rotenberg
1
Fall 2018
ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
Slide2Virtual Memory
Every program has its own virtual memoryLarge virtual address spaceDivided into virtual pages
When a program runs, it needs physical memoryPhysical memory is actual storage:DRAM – main memory
Hard Disk – overflow storage for main memoryOperating System (O/S) manages physical memory as a shared resource among many running programs
When a program first accesses a particular virtual page, O/S is invoked to allocate a
physical page
in main memory that will now correspond to the program’s virtual pageUpon starting a program: The O/S “loader” allocates initial physical pages for the program’s text (code) and data segments (globals, stack)During program execution: The O/S “page-fault handler” allocates physical pages for first-time accesses to new virtual pages (e.g., heap, stack). The handler also swaps physical pages between the main memory (in DRAM) and the overflow storage for main memory (in Hard Disk).
Fall 2018
ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
2
Slide3Virtual Memory (cont.)
Fall 2018ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
3
…
…
Physical Memory
Virtual Memory
for Program #1
…
Virtual Memory
for Program #2
1 physical page
Virtual
Page
Number
(VPN)
3
4
8
Physical
Page
Number
(PPN)
Hard Disk: Overflow Storage for Main Memory
swap space
file system
1 virtual page
DRAM: Main Memory
0
1
2
5
6
7
3
4
8
0
1
2
5
6
7
Virtual
Page
Number
(VPN)
3
4
N
0
1
2
5
Slide4Virtual Memory (cont.)
Fall 2018ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
4
…
…
Physical Memory
Virtual Memory
for Program #1
…
Virtual Memory
for Program #2
Hard Disk: Overflow Storage for Main Memory
swap space
file system
DRAM: Main Memory
VPN:
3
4
8
0
1
2
5
6
7
PPN:
3
4
N
0
1
2
5
3
4
8
0
1
2
5
6
7
VPN:
Slide5Virtual Memory (cont.)
Fall 2018ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
5
Hard Disk: Overflow Storage for Main Memory
swap space
file system
…
…
…
11
12
10
??
Physical Memory
Virtual Memory
for Program #1
Virtual Memory
for Program #2
VPN:
VPN:
DRAM: Main Memory
PPN:
3
4
8
0
1
2
5
6
7
9
3
4
8
1
2
5
6
7
0
3
4
N
0
1
2
5
Slide6O/S Page Tables
O/S maintains a page table (PT) per running programPT is software data structure used to translate the program’s virtual addresses to physical addresses
PT is searched using the virtual addressThere is one PT entry for each virtual page used by the program. Contents of PT entry:
Whether corresponding physical page is in main memory (DRAM) or in swap space (hard disk)If in main memory: PT entry provides physical page number
If in swap space: PT entry provides location on disk
PT entry typically has other information too (
recency of access, protection bits, etc.)Fall 2018ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
6
Slide7Virtual-to-PhysicalAddress Translation
Fall 2018
ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
7
A running
program
virtual
address
O/S Page Table
for running
program
physical
address
translate
virtual address to
physical address
Access
L1 I/D caches
(entry point to
memory
hierarchy)
Slide8Disk
Virtual-to-Physical Address Translation
Fall 2018
ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
8
First time VPN
referenced?
In DRAM
or
on Disk?
yes
on Disk
no
In DRAM
VPN
first ever
access
to VPN
Swap-in
from disk
to DRAM
(“page fault”)
PPN
DRAM
DRAM
Disk
PPN
PPN
DRAM
PPN
Scenario 1
Scenario 2
Scenario 3
Slide9Overhead of Virtual Memory
Program counter is a virtual addressEach instruction fetch requires address translationLoads and stores generate virtual addressesEach load and store requires address translation
For every instruction fetch, load, and store, call O/S to search the page table???This would have unacceptable performance
The O/S address translation function takes 10s to 100s of instructions and several memory accesses (to access page table). Exact overhead depends on the scenario, page table organization, etc
.
There has to be a better way…
Fall 2018ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg9
Slide10Translation Lookaside Buffer (TLB)
The TLB is a small cache of recently used address translationsTLB is defined in the ISA, because software and hardware collaborate w.r.t. the TLBHardware role
Hardware provides and accesses
the TLB Provides: TLB is a hardware table
Accesses
: Hardware searches the TLB for desired translation. If TLB doesn’t have translation, hardware calls the O/S.
Software roleSoftware manages the TLBWhen there is a TLB miss, O/S is responsible for handling the miss. Once it has a translation, it writes the translation into the TLB, at a location of its choosing.Fall 2018
ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
10
Slide11Virtual-to-PhysicalAddress Translation, with TLB
Fall 2018
ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
11
A running
program
virtual
address
O/S Page Table
for running
program
physical
address
translate
virtual address to
physical address
Access
L1 I/D caches
(entry point to
memory
hierarchy)
TLB
TLB miss
exception
TLB-write instruction
puts translation into TLB
CPU
is executing
application
CPU
is executing
O/S TLB miss
exception handler
1 cycle
10s-100s of
cycles
Slide12TLB Organization
TLB organizationCan be direct-mapped, set-associative, or fully-associativeMany modern RISC ISAs define it to be fully-associative since software manages it
Unified versus split TLBsUnified: One TLB for both instruction and data address translationSplit: Separate TLBs for instruction and data address translation
I-TLB: TLB for instruction address translation (program counter). I-TLB sits alongside L1 I-cache.D-TLB: TLB for data address translation (loads and stores). D-TLB sits alongside L1 D-cache.
Fall 2018
ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
12
Slide13Using the TLB for translation
Example: Fully-associative TLBFall 2018
ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
13
valid bit
virtual page number
physical page number
=?
VPN
PPN
v
=?
VPN
PPN
v
=?
VPN
PPN
v
=?
VPN
PPN
v
X
X’
X
X’
Slide14TLB increases hit time
Fall 2018ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
14
virtual address
TLB
1 cycle
L1 Cache
1 cycle
physical address
L1 Cache
1 cycle
address
Without Virtual Memory
With Virtual Memory
Slide15Using the TLB for translation:A closer look
Observation: What if index
bits were entirely contained in page offset bits?Then first part of cache access (indexing) would not wait on TLB
Fall 2018
ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
15
Ex: page size = 4KB
# page offset bits = log
2
(4KB) = 12
This is how cache interprets
the physical address
virtual page number
page offset
31
12
11
0
TLB
tag
block
offset
index
physical page number
page offset
31
12
11
0
31
0
virtual address
physical address
Slide16Accessing TLB and Cache in Parallel
Cache hit time reduces from two cycles to one!Because cache can now be indexed in parallel with TLB access (only the final tag match uses the output from TLB)
But some constraints...
Fall 2018
ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
16
data array
word select
tag array
=?
virtual page number
page offset
31
12
11
0
TLB
tag
physical page number
31
12
index
block
offset
11
0
31
12
Slide17Constraint: Size of 1 cache way
Constraint for “physically-indexed cache with parallel cache/TLB access”Index and block offset bits contained within page offset bits
Therefore: Total amount of storage in 1 way of the cache should not exceed page size
Fall 2018
ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
17
Way 1
Way 2
Way N
…
N-way set-assoc. cache
# sets
block size
set
Slide18Page size / associativity tradeoff
From previous slide:Cache size equation:Therefore:Example: MC88110Page size = 4KB
I$, D$ both: 8KB 2-way set-associative(8KB/4KB) = 2 waysExample: VAX seriesPage size = 512B
For a 16KB cache, need assoc. = (16KB / 512B) = 32-way set-associative! Moral: sometimes associativity is thrust upon you
Fall 2018
ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
18
Slide19Backup SlidesThe following slides are for interest only
Fall 2018
ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
19
Slide20Notation on following slide
X: Virtual page number to be translated into a physical page numberX’: Physical page number corresponding to XV: Virtual page number of a victim page
Fall 2018
ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
20
PT[X].resident
If
true
: X is in DRAM, at PT[X].ppn. Thus X’ = PT[X].ppn.
If
false
: X is
not
in DRAM.
PT[X].ppn is bogus, PT[X].disk_loc indicates disk location.
struct
page_table_entry
{
bool resident; // ‘true’ if in DRAM, ‘false’ if on disk (swap space)
unsigned
int
ppn
; // if resident==true, this is physical page number
disk_loc_type
disk_loc
; // if resident==false, this is location on disk
}
// PT is a hash table (unbounded array) of page table entries.
// Key (index) of the hash table is virtual page number.
Slide21Virtual-to-Physical Address Translation
Fall 2018ECE 463/563, Microprocessor Architecture, Prof. Eric Rotenberg
21
Is X in PT?
PT[X].resident
Is there a
free DRAM
page?
X’ = PT[X].ppn
PT[X].resident = true
PT[X].ppn = freelist.pop()
no
no
false
yes
yes
true
X
first ever
access
to X
SwapOut(V)
PT[V].resident = false
PT[V].disk_loc = …
PT[X].resident = true
PT[X].ppn = PT[V].ppn
Is there a
free DRAM
page?
PT[X].resident = true
PT[X].ppn = freelist.pop()
SwapIn(X)
no
yes
Swap-in
X from disk
to DRAM
(“page fault”)
SwapOut(V)
PT[V].resident = false
PT[V].disk_loc = …
PT[X].resident = true
PT[X].ppn = PT[V].ppn
SwapIn(X)
X’