Exceeding the Best of Nested and Shadow Paging Jayneel Gandhi Mark D Hill Michael M Swift Executive Summary Problem Virtualization valuable but have high overheads with larger workloads at most 70 slower than native ID: 539325
Download Presentation The PPT/PDF document "Agile Paging:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Agile Paging:Exceeding the Best of Nested and Shadow Paging
Jayneel Gandhi, Mark D. Hill, Michael M. SwiftSlide2
Executive Summary
Problem:
Virtualization valuable but have high overheads with larger workloads
(at most 70% slower than native)
Existing Choices: Nested Paging: slow page walk but fast page table updatesShadow Paging: fast page walk but slow page table updatesCan we get best of both for same address space (or same page walk)?Yes, Agile Paging: use shadow paging and sometime switch to nested paging within the same page walk (at most 4% slower than native)
2Slide3
Motivation
Agile Paging
Results
Summary
Outline3Slide4
Virtualization Overview
4
Benefits:
Foundation of our cloud infrastructure
Provides on-demand virtual instances Helps server consolidationProblem:Overheads of virtualizing memory is highAt most 70% slower than unvirtualizedVMMHardwareAPP
APP
Guest OSSlide5
Virtualizing Memory
5
VMM
Hardware
APPAPPGuest Virtual AddresshPAGuest Physical Address
Host Physical Address
gVA
gPA
Guest OS
Guest Page Table
Nested Page TableSlide6
Virtualizing Memory
6
hPA
gVA
gPAGuest Page TableNested Page Table
Two techniques to manage both page tables
Nested Paging -- Hardware
Shadow Paging – Software
Evaluated on two axis:
Page Walk Latency & Page Table UpdatesSlide7
Unvirtualized x86-64 Translation
7
VA
PA
Virtual AddressPhysical AddressAt most mem accesses = 4Hardware
APP
APP
OS
CR3Slide8
1. Nested Paging – Hardware
8
hPA
gVA
gPAGuest Page TableNested Page Table
gCR3
gVA
At most Mem
accesses
+ 5
+ 5
+ 5
+ 4
hPA
= 24
5
Longer Page WalkSlide9
2. Shadow Paging – Software
9
VMM
Hardware
APPAPPhPA
gPA
Guest OS
Guest Page Table
Nested Page Table
gVA
Shadow Page Table
RO
Guest Page Table
(Read Only)
ROSlide10
2. Shadow Paging – Software
10
hPA
Guest Page Table
(Read Only)Nested Page TablegVAShadow Page Table
sCR3
Shorter Page Walk
At most mem accesses = 4Slide11
Page Table Updates
11
hPA
Guest Page Table
(Read Only)Nested Page TablegVAShadow Page TablehPA
gVA
gPA
Guest Page Table
Nested Page Table
1. Nested Paging
2. Shadow Paging
In-place fast update
Slow meditated update
VMM TrapSlide12
Key Observation12
Guest Virtual Address Space
Fully static address space
Shadow Paging preferred
Fully dynamic address space
Nested Paging preferred
Reality !!!
Small fraction of address space is dynamicSlide13
Key Observation13
gCR3
Shadow
Nested
Guest Page TableSlide14
MotivationAgile Paging
Results
Summary
Outline14Slide15
Agile PagingStart page walk in shadow mode
-- Achieving fast TLB misses
Optionally switch to nested mode
-- Allowing fast in-place updates
15Two parts of design: 1. Mechanism 2. PolicySlide16
gCR3
1. Mechanism
16
sCR3
Read only
hPA
gVA
gPA
Guest Page Table
Nested Page Table
1
1
Nested Page Table
Shadow Page Table
Guest Page TableSlide17
1. Mechanism: Example Page Walk17
+ 5
hPA
= 8
sCR3
gVA
gCR3
gVA
At most Mem
accesses
+ 1
+ 1
1
Switch modes @ level 4 of guest page tableSlide18
2. Policy: Shadow Nested
18
Shadow
(1 Write)
ShadowWrite to page table(VMM Trap)NestedWrite to page table(VMM Trap)Subsequent Writes(No VMM Traps)
StartSlide19
2. Policy: Nested Shadow
19
Shadow
(1 Write)
ShadowNestedSubsequent Writes(No VMM Traps)TimeoutMove non-dirtyUse dirty bits to track
writes to guest page table
Start
Write to page table
(VMM Trap)
Write to page table
(VMM Trap)Slide20
MotivationAgile Paging
Results
Summary
Outline
20Slide21
MethodologyMeasure cost on page walks on real hardware
Intel 12-core Sandy-bridge with 96GB memory64-entry L1 TLB + 512-entry L2 TLB 4-way associative for 4KB pages32-entry L1 TLB 4-way associative for 2MB pages
Prototype VMM and emulate hardware in Linux v3.12.13
BadgerTrap
for online analysis of TLB misses and emulate agile pagingLinear model to predict performanceWorkloadsBig-memory workloads, SPEC 2006, BioBench, PARSEC21Slide22
Performance Results
22
Measured using performance counters
Modeled based on emulator:
BadgerTrap
Solid bottom bar: Page walk overhead Hashed top bar: VMM overheads
B:
Unvirtualized
N: Nested Paging
S: Shadow Paging
A: Agile PagingSlide23
Performance Results23
Nested Paging has high overheads of TLB misses
Effect of longer page walk
Solid bottom bar: Page walk overhead Hashed top bar: VMM overheads
28%18%19%6%B: Unvirtualized
N: Nested Paging
S: Shadow Paging
A: Agile PagingSlide24
Performance Results24
Shadow Paging has high overheads of VMM interventions
Solid bottom bar: Page walk overhead Hashed top bar: VMM overheads
11%
30%6%70%B: Unvirtualized
N: Nested Paging
S: Shadow Paging
A: Agile Paging
28%
18%
19%
6%Slide25
Performance Results25
Agile paging consistently performs better than both techniques
Solid bottom bar: Page walk overhead Hashed top bar: VMM overheads
28%
11%18%30%19%
6%
6%
70%
2%
4%
2%
3%
B:
Unvirtualized
N: Nested Paging
S: Shadow Paging
A: Agile PagingSlide26
Summary
Problem:
Virtualization valuable but have high overheads with larger workloads
(At most 70% slower than native)
Existing Choices: Nested Paging: slow page walk but fast page table updatesShadow Paging: fast page walk but slow page table updatesCan we get best of both for same address space (or same page walk)?Yes, Agile Paging: use shadow paging and sometime switch to nested paging within the same page walk (At most 4% slower than native)
26Slide27
Questions ?27Slide28
Can we get best of both worlds?28
Nested Paging
Shadow Paging
Agile
PagingDimensions2D1D1D# of memory accesses244~4-5Page table updatesFastin-placeSlowout of place
Fast
in-placeSlide29
Short-Lived ProcessesIssue:
The cost of creating shadow page table is highSolution:
Start shadow mode after 1 sec for agile paging
Give user mode access to run only in nested mode
29Slide30
Accessed/Dirty BitsIssue:
Shadow mode is slow for setting A/D bitsCoherence between shadow and guest page tables causes VMM traps.
Solution:
Hardware Optimization
Intel sets accessed/dirty bits on both guest and nested page tablesBroadwell supports multiple page table walkers per-coreWe propose to write A/D bits on all three page tables by hardware30Slide31
Context-SwitchesIssue:
Intra-guest context switches with shadow mode are slowerGuest OS does not know existence of shadow page table --- VMM trap
Solution:
Hardware Optimization
Add a small VMM managed cache of guest CR3 shadow CR3Looked up by hardware for matching entry on context-switchIf hits, does not require VMM trap31Slide32
Why does agile paging work?32
Switch Level
Shadow
L4
L3L2L1NestedMem. Acc.48121620
24
Avg.
graph500
99.8%
0.2%
-
-
-
-
4.01
memcached
88.2%
4.5%
7.3%
-
-
-
4.76
canneal
94.7%
4.6%
0.7%
-
-
-
4.24
dedup
91.4%
2.2%
6.4%
-
-
-
4.60
Brings average number of memory accesses down to ~(4-5) from 24Slide33
Transparent Huge Page (2MB)
33
Solid bottom bar: Page walk overhead Hashed top bar: VMM overheads
13%
4%10%14%14%5%6%68%2%3%2%
2%
B:
Unvirtualized
N: Nested Paging
S: Shadow Paging
A: Agile PagingSlide34
Design ComponentsHardware
Three page table pointersPoints to each of the page tables
Enhanced page table walker
Interprets switching bit
Bridges the two state machinesVMMManage three page tablesIncremental from shadow pagingPolicies for changing modesEncapsulate policies in VMM34