/
Efficient Virtual Memory for Big Memory Servers Efficient Virtual Memory for Big Memory Servers

Efficient Virtual Memory for Big Memory Servers - PowerPoint Presentation

pasty-toler
pasty-toler . @pasty-toler
Follow
488 views
Uploaded On 2016-06-12

Efficient Virtual Memory for Big Memory Servers - PPT Presentation

U Wisc and HP Labs ISCA13 Architecture Reading Club Summer13 1 Key points Big memory workloads Memcached databases graph analysis Analysis shows TLB misses can account for upto ID: 359033

club memory reading architecture memory club architecture reading summer page direct tlb physical pages misses primary virtual paged process

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Efficient Virtual Memory for Big Memory ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Efficient Virtual Memory for Big Memory Servers

U Wisc and HP LabsISCA’13

Architecture Reading Club Summer'13

1Slide2

Key points

Big memory workloads Memcached, databases, graph analysis

Analysis showsTLB misses can account for upto 51% of execution time

Rich features of Paged VM is not needed by most applications

Proposal :

Direct SegmentsPaged VM as usual where neededSegmentation where possibleFor big memory workloads – this eliminates 99% of data TLB misses !

Architecture Reading Club Summer'13

2Slide3

Main Memory Mgmt

Trends The amount of physical memory has gone from a few MBs to a few GBs and then to several TBs now

But at the same time the size of the DTLB has remained fairly unchanged Pent III – 72 Pent IV – 64 Nehalem – 96 IvyBridge

– 100

Also workloads were nicer in the days-gone-by (higher locality)

So higher memory cap + const TLB + misbehaving apps = more TLB misses

Architecture Reading Club Summer'13

3Slide4

So how bad is it really ?

Architecture Reading Club Summer'13

4Slide5

Main Features of Paged VM

Feature

Analysis

Verdict

Swapping

No swappingNot required

Per Page Access Perms

99% of pages are read-write

Overkill

Fragmentation

mgmt.

Very little

OS visible fragmentation

Per-page

reallocation is not important

Architecture Reading Club Summer'13

5Slide6

Main Memory Allocation

Architecture Reading Club Summer'13

6Slide7

Paged VM – why is it needed ?

Shared memory regions for Inter-Process-CommunicationCode regions protected by per-page R/W Copy on-write uses per-page R/W for lazy implementation of fork.

Guard pages at the end of thread-stacks.

Architecture Reading Club Summer'13

7

Dynamically allocated

Heap region

Paging Valuable

Paging Not Needed

Code

Constants

Shared Memory

Mapped Files

Guard Pages

VA

*

StackSlide8

Direct Segments

Hybrid Paged + Segmented memory (not one on top of the other).

Architecture Reading Club Summer'13

8Slide9

Address Translation

Architecture Reading Club Summer'13

9Slide10

OS Support : Handling Physical Memory

Setup Direct Segment registersBASE = Start VA of Direct SegmentLIMIT = End VA of Direct Segment

OFFSET = BASE – Start PA of Direct SegmentSave and restore register values as part of process metadata on context-switch

Create contiguous physical memory region

Reserve at startup – big

memory apps are cognizant of memory requirement at startup.Memory compaction – latency insignificant for long running jobs

Architecture Reading Club Summer'13

10Slide11

OS Support : Handling Virtual

MemoryPrimary regionsAbstraction presented to applicationContiguous Virtual address space backed by Direct Segment

What goes in the primary regionDynamically allocated R/W memory

Application can indicate what it needs to put in primary region

The size of the primary region is set to a very high value to accommodate the whole of the physical memory if need be

64-bit VA support 128TB of VM, so pretty much never running out of VA spaceArchitecture Reading Club Summer'13

11Slide12

Evaluation

MethodologyImplement Primary Region in the kernelFind the number of TLB misses that would be served by the

non-existent direct segmentsx86 uses hardware page-table walker they trap all TLB misses by duping the system into believing that the PTE residing in memory is invalid

In the handler

They touch the page with the faulting address

Again mark the PTE to invalid

Architecture Reading Club Summer'13

12Slide13

Results

Architecture Reading Club Summer'13

13Slide14

Results

Architecture Reading Club Summer'13

14Slide15

Why not large pages ?

Huge pages does not automatically scaleNew page size and/or more TLB entries TLBs dependent on access locality

Fixed ISA-defined sparse page sizese.g., 4KB, 2MB, 1GB Needs to be aligned at page size boundaries

Multiple page sizes introduces TLB tradeoffs

Fully associative vs. set-associative designs

Architecture Reading Club Summer'13

15Slide16

Virtual Memory Basics

Architecture Reading Club Spring'13

16

Core

Cache

TLB

(Translation

Lookaside

Buffer)

Process 1

Process 2

Virtual Address Space

Physical Memory

Page Table

16