/
Prefetch-Aware Prefetch-Aware

Prefetch-Aware - PowerPoint Presentation

sherrill-nordquist
sherrill-nordquist . @sherrill-nordquist
Follow
414 views
Uploaded On 2016-07-22

Prefetch-Aware - PPT Presentation

SharedResource Management for MultiCore Systems Eiman Ebrahimi Chang Joo Lee Onur Mutlu Yale N Patt HPS Research Group The University of Texas at Austin ID: 414904

core bank memory prefetches bank core prefetches memory prefetch shared management aware resource stall prefetching batch throttling accurate compute

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Prefetch-Aware" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Prefetch-Aware

Shared-Resource Managementfor Multi-Core Systems

Eiman Ebrahimi*Chang Joo Lee*+Onur Mutlu‡Yale N. Patt*

* HPS Research Group The University of Texas at Austin

‡ Computer Architecture LaboratoryCarnegie Mellon University

+

Intel Corporation

AustinSlide2

2

Background and Problem

Core 0Core 1Core 2Core NShared CacheMemory ControllerDRAMBank 0DRAM

Bank 1DRAM Bank 2

...

DRAM

Bank

K

...

Shared Memory

Resources

Chip Boundary

On-chip

Off-chip

2

Core 0 Prefetcher

Core N Prefetcher

...

...Slide3

Background and Problem

Understand the impact of prefetching on previously proposed shared resource management techniques3Slide4

Background and Problem

Understand the impact of prefetching on previously proposed shared resource management techniquesFair cache management techniquesFair memory controllersFair management of on-chip inteconnectFair management of multiple shared resources4Slide5

Background and Problem

Understand the impact of prefetching on previously proposed shared resource management techniquesFair cache management techniquesFair memory controllersNetwork Fair Queuing (Nesbit et. al. MICRO’06)Parallelism Aware Batch Scheduling (Mutlu et. al. ISCA’08)Fair management of on-chip interconnectFair management of multiple shared resourcesFairness via Source Throttling (Ebrahimi et. al., ASPLOS’10)

5Slide6

Background and Problem

6Fair memory scheduling technique: Network Fair Queuing (NFQ)

Improves fairness and performance with no prefetchingSignificant degradation of performance and fairness in the presence of prefetchingNo PrefetchingAggressive Stream PrefetchingSlide7

Background and Problem

Understanding the impact of prefetching on previously proposed shared resource management techniquesFair cache management techniquesFair memory controllersFair management of on-chip inteconnectFair management of multiple shared resourcesGoal: Devise general mechanisms for taking into account prefetch requests in

fairness techniques7Slide8

Background and ProblemPrior work addresses inter-application interference caused by prefetches

Hierarchical Prefetcher Aggressiveness Control (Ebrahimi et. al., MICRO’09)Dynamically detects interference caused by prefetches and throttles down overly aggressive prefetchersEven with controlled prefetching, fairness techniques should be made prefetch-aware 8Slide9

Outline

Problem StatementMotivation for Special Treatment of PrefetchesPrefetch-Aware Shared Resource ManagementEvaluationConclusion9Slide10

Parallelism-Aware Batch Scheduling (PAR-BS) [

Mutlu & Moscibroda ISCA’08]Principle 1: Parallelism-awarenessSchedules requests from each thread to different banks back to backPreserves each thread’s bank parallelismPrinciple 2: Request BatchingMarks a fixed number of oldest requests from each thread

to form a “batch”Eliminates starvation & provides fairness10Bank 0Bank 1T1T1T0

T0

T2T2

T3

T3

T3

T2

T2

Batch

T0

T1

T1Slide11

Impact of Prefetching onParallelism-Aware Batch Scheduling

Policy (a): Include prefetches and demands alike when generating a batchPolicy (b): Prefetches are not included alongside demands when generating a batch11Slide12

Impact of Prefetching on

Parallelism-Aware Batch Scheduling

12Bank 1Bank 2Bank 1Bank 2Policy (a) Mark Prefetches in PAR-BSPolicy (b) Don’t Mark Prefetches in PAR-BS

P1

D1D2

P2

P1

P1

D2

D2

P2

Service Order

P1

D1

D2

P2

P1

P1

D2

D2

P2

DRAM

Bank 1

Bank 2

Core 1

Core 2

P1

D1

D2

P2

P1

P1

D2

D2

P2

Compute

Compute

Hit

P2

Hit

P2

Service Order

Bank 1

Bank 2

Core 1

Core 2

P1

D1

D2

P2

P1

P1

D2

D2

P2

Compute

Compute

Miss

Miss

P1

D1

D2

P2

P1

P1

D2

D2

P2

Saved Cycles

Saved

Cycles

Accurate Prefetch

Inaccurate Prefetch

Accurate Prefetches

Too Late

Stall

Stall

C

C

Stall

C

C

Stall

Stall

Stall

Batch

BatchSlide13

Impact of Prefetching on Parallelism-Aware Batch Scheduling

Policy (a): Include prefetches and demands alike when generating a batchPros: Accurate prefetches will be more timelyCons: Inaccurate prefetches from one thread can unfairly delay demands and accurate prefetches of othersPolicy (b): Prefetches are not included alongside demands when generating a batchPros: Inaccurate prefetches can not unfairly delay demands of other coresCons: Accurate prefetches will be less timely

Less performance benefit from prefetching13Slide14

Outline

Problem StatementMotivation for Special Treatment of PrefetchesPrefetch-Aware Shared Resource ManagementEvaluationConclusion14Slide15

Prefetch-Aware Shared Resource Management

Three key ideas:Fair memory controllers: Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracyFairness via source-throttling technique:Coordinate core and prefetcher throttling decisionsDemand boosting

for memory non-intensive applications15Slide16

Prefetch-Aware Shared Resource Management

Three key ideas:Fair memory controllers: Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracyFairness via source-throttling technique:Coordinate core and prefetcher throttling decisions

Demand boosting for memory non-intensive applications16Slide17

Batch

Prefetch-aware PARBS (P-PARBS)17

Bank 1Bank 2P1D1D2

P2

P1

P1

D2

D2

P2

Service Order

DRAM

Bank 1

Bank 2

Core 1

Core 2

P1

D1

D2

P2

P1

P1

D2

D2

P2

Compute

Compute

Hit

P2

Hit

P2

Accurate Prefetch

Inaccurate Prefetch

Stall

C

C

Stall

Policy (a) Mark Prefetches in PAR-BSSlide18

Batch

Prefetch-aware PARBS (P-PARBS)

18Bank 1Bank 2Policy (b) Don’t Mark Prefetches in PAR-BSP1D1D2P2P1

P1D2

P2Service Order

Bank 1

Bank 2

Core 1

Core 2

P1

D1

D2

P2

P1

P1

D2

D2

P2

Compute

Compute

Miss

Miss

D2

Saved

Cycles

Stall

Stall

C

C

Stall

Stall

Bank 1

Bank 2

Our Policy: Mark Accurate Prefetches

P1

D1

D2

P2

P1

P1

D2

D2

P2

Service Order

DRAM

Bank 1

Bank 2

Core 1

Core 2

P1

D1

D2

P2

P1

P1

D2

D2

P2

Compute

Compute

Hit

P2

Hit

P2

Accurate Prefetch

Inaccurate Prefetch

Stall

C

C

Stall

Batch

Accurate Prefetches

Too Late

Underlying prioritization policies need to distinguish between prefetches based on accuracySlide19

Prefetch-Aware Shared Resource Management

Three key ideas:Fair memory controllers: Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracyFairness via source-throttling technique:Coordinate core and prefetcher throttling decisions

Demand boosting for memory non-intensive applications19Slide20

Bank 1

Bank 2

Serviced First

Serviced Last

Service

Order

No Demand Boosting

With Demand Boosting

Core1 Dem

Core2 Dem

Legend:

Core2

Pref

Core 1 is memory

non-intensive

Core 2 is memory

intensive

Core1 Dem

Core2 Dem

Legend:

Core2

Pref

Core 1 is memory

non-intensive

Core 2 is memory

intensive

Bank 1

Bank 2

Demand boosting eliminates starvation of memory non-intensive applicationsSlide21

Prefetch-Aware Shared Resource Management

Three key ideas:Fair memory controllers: Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracyFairness via source-throttling technique:

Coordinate core and prefetcher throttling decisionsDemand boosting for memory non-intensive applications21Slide22

Outline

Problem StatementMotivation for Special Treatment of PrefetchesPrefetch-Aware Shared Resource ManagementEvaluationConclusion22Slide23

Evaluation Methodology

x86 cycle accurate simulatorBaseline processor configurationPer-core4-wide issue, out-of-order, 256 entry ROBShared (4-core system)128 MSHRs2MB, 16-way L2 cacheMain MemoryDDR3 1333 MHzLatency of 15ns per command (tRP, tRCD, CL)8B wide core to memory bus

23Slide24

System Performance Results

24

11%

10.9%

11.3%Slide25

Max Slowdown Results

25

9.9%18.4%14.5%Slide26

Conclusion

State-of-the-art fair shared resource management techniques can be harmful in the presence of prefetchingTheir underlying prioritization techniques need to be extended to differentiate prefetches based on accuracyCore and prefetcher throttling should be coordinated with source-based resource management techniquesDemand boosting eliminates starvation ofmemory non-intensive applications

Our mechanisms improve both fair memory schedulers and source throttling in both system performance and fairness by >10%26Slide27

Prefetch-Aware

Shared-Resource Managementfor Multi-Core Systems

Eiman Ebrahimi*Chang Joo Lee*+Onur Mutlu‡Yale N. Patt** HPS Research Group

The University of Texas at Austin‡ Computer Architecture Laboratory

Carnegie Mellon University+ Intel Corporation

Austin