/
Redundant Memory Mappings for Fast Access to Large Memories Redundant Memory Mappings for Fast Access to Large Memories

Redundant Memory Mappings for Fast Access to Large Memories - PowerPoint Presentation

mitsue-stanley
mitsue-stanley . @mitsue-stanley
Follow
374 views
Uploaded On 2017-12-21

Redundant Memory Mappings for Fast Access to Large Memories - PPT Presentation

Vasileios Karakostas Jayneel Gandhi Furkan Ayar Adrián Cristal Mark D Hill Kathryn S Mckinley Mario Nemirovsky Michael M Swift Osman S Ünsal ID: 617131

memory range tlb virtual range memory virtual tlb page table physical dtlb translations translation entry pages redundant overheads performance

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Redundant Memory Mappings for Fast Acces..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Redundant Memory Mappings for Fast Access to Large Memories

Vasileios

Karakostas

, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S. Mckinley, Mario Nemirovsky, Michael M. Swift, Osman S. ÜnsalSlide2

Executive Summary

Problem: Virtual memory overheads are high (up to 41%)Proposal: Redundant Memory MappingsPropose compact representation called range translationRange Translation – arbitrarily large contiguous mappingEffectively cache, manage and facilitate range translationsRetain flexibility of 4KB pagingResult:Reduces overheads of virtual memory to less than 1%2Slide3

Outline

Motivation Virtual Memory Refresher + Key Technology TrendsPrevious ApproachesGoals + Key ObservationDesign: Redundant Memory MappingsResultsConclusion3Slide4

Virtual Memory Refresher

4TLB(Translation Lookaside Buffer)Process 1Process 2Virtual Address SpacePhysical Memory

Page Table

Challenge:

How to reduce costly page walks?Slide5

Two Technology Trends

5*Inflation-adjusted 2011 USD, from: jcmit.comTLB reach is limitedYearProcessorL1 DTLB entries1999Pent. III722001Pent. 4642008Nehalem962012IvyBridge1002015Broadwell100Slide6

0. Page-based Translation

6Virtual MemoryVPN0 PFN0TLBPhysical MemorySlide7

1. Multipage Mapping

7Virtual MemoryClustered TLB[ASPLOS’94, MICRO’12 and HPCA’14]Physical MemorySub-blocked TLB/CoLT

VPN(0-3) PFN(0-3)

BitmapMapSlide8

2. Large Pages

8Virtual MemoryPhysical Memory

[Transparent Huge Pages and

libhugetlbfs

]

VPN0

 PFN0

Large Page TLBSlide9

3. Direct Segments

9Virtual MemoryDirect Segment(BASE,LIMIT)  OFFSETBASELIMITOFFSET[ISCA’13 and MICRO’14]Physical MemorySlide10

Can we get best of many worlds?

Multipage MappingLarge PagesDirect SegmentsOur ProposalFlexible alignmentArbitrary reachMultiple entriesTransparent to applicationsApplicable to all workloads10Slide11

Key Observation

11Virtual MemoryPhysical MemorySlide12

Key Observation

12Virtual Memory

Large contiguous regions of virtual memory

Limited in number: only a few handfulPhysical Memory

Code

Heap

Stack

Shared Lib.Slide13

Compact Representation: Range Translation

13Virtual Memory

Physical Memory

BASE1

LIMIT1

OFFSET1

Range Translation 1

Range Translation:

is

a mapping between

contiguous

virtual pages mapped to contiguous physical pages

with uniform protection Slide14

Redundant Memory Mappings

14Virtual Memory

Physical Memory

Range Translation 1

Range Translation 2

Range Translation

3

Range Translation 4

Range Translation

5

Map most of process’s virtual address space redundantly with modest number of range translations in addition to page mappingsSlide15

Outline

MotivationDesign: Redundant Memory Mappings A. Caching Range TranslationsB. Managing Range TranslationsC. Facilitating Range TranslationsResultsConclusion15Slide16

A. Caching Range Translations

16V47 …………. V12P47 …………. P12L1 DTLBL2 DTLBRange TLBPage Table WalkerEnhanced Page Table WalkerSlide17

A. Caching Range Translations

17HitV47 …………. V12P47 …………. P12L1 DTLBRange TLBEnhanced Page Table WalkerL2 DTLBSlide18

A. Caching Range Translations

18MissV47 …………. V12P47 …………. P12L1 DTLBRange TLBEnhanced Page Table WalkerL2 DTLBHitRefillSlide19

A. Caching Range Translations

19MissV47 …………. V12P47 …………. P12L1 DTLBRange TLBEnhanced Page Table WalkerL2 DTLBHitRefillSlide20

A. Caching Range Translations

20MissV47 …………. V12P47 …………. P12L1 DTLBRange TLBL2 DTLBHitRefillEntry 1BASE 1LIMIT 1≤>Entry NBASE NLIMIT N≤

>

OFFSET 1 Protection 1OFFSET N Protection NL1 TLB Entry Generator Logic: (Virtual Address + OFFSET) ProtectionSlide21

A. Caching Range Translations

21MissV47 …………. V12P47 …………. P12L1 DTLBRange TLBEnhanced Page Table WalkerL2 DTLBMissMissSlide22

B. Managing Range Translations

Stores all the range translations in a OS managed structurePer-process like page-table22Range TableCR-RTRTCRTDRTFRTGRTARTB

RT

ESlide23

B. Managing Range Translations

23A) Page TableB) Range TableC) Both A) and B)D) Either?On a L2+Range TLB miss, what structure to walk? Is a virtual page part of range? – Not known at a missSlide24

B. Managing Range Translations

Redundancy to the rescueOne bit in page table entry denotes that page is part of a range24Page Table Walk1Insert into L1 TLB2Application resumes memory access3Range Table Walk (Background)Insert into Range TLBPart of a rangeCR-RTRTCRTDRTFRTG

RT

ARTB

RT

E

CR-3Slide25

C. Facilitating Range Translations

25Virtual MemoryPhysical Memory

Does not facilitate physical page contiguity for range creation

Demand PagingSlide26

C. Facilitating Range Translations

26Virtual MemoryPhysical Memory

Allocate physical pages when virtual memory is allocated

Increases range sizes

 Reduces number of ranges

Eager PagingSlide27

Outline

MotivationDesign: Redundant Memory MappingsResults MethodologyPerformance ResultsVirtual ContiguityConclusion27Slide28

Methodology

Measure cost on page walks on real hardwareIntel 12-core Sandy-bridge with 96GB memory64-entry L1 TLB + 512-entry L2 TLB 4-way associative for 4KB pages32-entry L1 TLB 4-way associative for 2MB pagesPrototype Eager Paging and Emulator in Linux v3.15.5BadgerTrap for online analysis of TLB misses and emulate Range TLBLinear model to predict performanceWorkloadsBig-memory workloads, SPEC 2006, BioBench, PARSEC28Slide29

Comparisons

4KB: Baseline using 4KB pagingTHP: Transparent Huge Pages using 2MB paging [Transparent Huge Pages]CTLB: Clustered TLB with cluster of 8 4KB entries [HPCA’14]DS: Direct Segments [ISCA’13 and MICRO’14]RMM: Our proposal: Redundant Memory Mappings [ISCA’15]29Slide30

Performance Results

30Measured using performance countersModeled based on emulator5/14 workloadsRest in paperAssumptions:CTLB: 512 entry fully-associativeRMM: 32 entry fully-associativeBoth in parallel with L2Slide31

Performance Results

31Overheads of using 4KB pages are very highSlide32

Performance Results

32Clustered TLB works well, but limited by 8x reachSlide33

Performance Results

332MB page helps with 512x reach: Overheads not very lowSlide34

Performance Results

34Direct Segment perfect for some but not all workloadsSlide35

Performance Results

35RMM achieves low overheads robustly across all workloads Slide36

Why low overheads? Virtual Contiguity

BenchmarkPagingIdeal RMM ranges4KB + 2MBTHP# of ranges#of ranges to cover more than 99% of memorycactusADM 1365 + 33311249canneal 10016 + 359774graph500 8983 + 35725863mcf 1737 + 839551tigr 28299 + 235163361000s of TLB entries requiredOnly 10s-100s of ranges per applicationOnly few ranges for 99% coverageSlide37

Summary

Problem: Virtual memory overheads are highProposal: Redundant Memory MappingsPropose compact representation called range translationRange Translation – arbitrarily large contiguous mappingEffectively cache, manage and facilitate range translationsRetain flexibility of 4KB pagingResult:Reduces overheads of virtual memory to less than 1%37Slide38

Questions ?

38