U Wisc and HP Labs ISCA13 Architecture Reading Club Summer13 1 Key points Big memory workloads Memcached databases graph analysis Analysis shows TLB misses can account for upto ID: 359033
Download Presentation The PPT/PDF document "Efficient Virtual Memory for Big Memory ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Efficient Virtual Memory for Big Memory Servers
U Wisc and HP LabsISCA’13
Architecture Reading Club Summer'13
1Slide2
Key points
Big memory workloads Memcached, databases, graph analysis
Analysis showsTLB misses can account for upto 51% of execution time
Rich features of Paged VM is not needed by most applications
Proposal :
Direct SegmentsPaged VM as usual where neededSegmentation where possibleFor big memory workloads – this eliminates 99% of data TLB misses !
Architecture Reading Club Summer'13
2Slide3
Main Memory Mgmt
Trends The amount of physical memory has gone from a few MBs to a few GBs and then to several TBs now
But at the same time the size of the DTLB has remained fairly unchanged Pent III – 72 Pent IV – 64 Nehalem – 96 IvyBridge
– 100
Also workloads were nicer in the days-gone-by (higher locality)
So higher memory cap + const TLB + misbehaving apps = more TLB misses
Architecture Reading Club Summer'13
3Slide4
So how bad is it really ?
Architecture Reading Club Summer'13
4Slide5
Main Features of Paged VM
Feature
Analysis
Verdict
Swapping
No swappingNot required
Per Page Access Perms
99% of pages are read-write
Overkill
Fragmentation
mgmt.
Very little
OS visible fragmentation
Per-page
reallocation is not important
Architecture Reading Club Summer'13
5Slide6
Main Memory Allocation
Architecture Reading Club Summer'13
6Slide7
Paged VM – why is it needed ?
Shared memory regions for Inter-Process-CommunicationCode regions protected by per-page R/W Copy on-write uses per-page R/W for lazy implementation of fork.
Guard pages at the end of thread-stacks.
Architecture Reading Club Summer'13
7
Dynamically allocated
Heap region
Paging Valuable
Paging Not Needed
Code
Constants
Shared Memory
Mapped Files
Guard Pages
VA
*
StackSlide8
Direct Segments
Hybrid Paged + Segmented memory (not one on top of the other).
Architecture Reading Club Summer'13
8Slide9
Address Translation
Architecture Reading Club Summer'13
9Slide10
OS Support : Handling Physical Memory
Setup Direct Segment registersBASE = Start VA of Direct SegmentLIMIT = End VA of Direct Segment
OFFSET = BASE – Start PA of Direct SegmentSave and restore register values as part of process metadata on context-switch
Create contiguous physical memory region
Reserve at startup – big
memory apps are cognizant of memory requirement at startup.Memory compaction – latency insignificant for long running jobs
Architecture Reading Club Summer'13
10Slide11
OS Support : Handling Virtual
MemoryPrimary regionsAbstraction presented to applicationContiguous Virtual address space backed by Direct Segment
What goes in the primary regionDynamically allocated R/W memory
Application can indicate what it needs to put in primary region
The size of the primary region is set to a very high value to accommodate the whole of the physical memory if need be
64-bit VA support 128TB of VM, so pretty much never running out of VA spaceArchitecture Reading Club Summer'13
11Slide12
Evaluation
MethodologyImplement Primary Region in the kernelFind the number of TLB misses that would be served by the
non-existent direct segmentsx86 uses hardware page-table walker they trap all TLB misses by duping the system into believing that the PTE residing in memory is invalid
In the handler
They touch the page with the faulting address
Again mark the PTE to invalid
Architecture Reading Club Summer'13
12Slide13
Results
Architecture Reading Club Summer'13
13Slide14
Results
Architecture Reading Club Summer'13
14Slide15
Why not large pages ?
Huge pages does not automatically scaleNew page size and/or more TLB entries TLBs dependent on access locality
Fixed ISA-defined sparse page sizese.g., 4KB, 2MB, 1GB Needs to be aligned at page size boundaries
Multiple page sizes introduces TLB tradeoffs
Fully associative vs. set-associative designs
Architecture Reading Club Summer'13
15Slide16
Virtual Memory Basics
Architecture Reading Club Spring'13
16
Core
Cache
TLB
(Translation
Lookaside
Buffer)
Process 1
Process 2
Virtual Address Space
Physical Memory
Page Table
16