SharedResource Management for MultiCore Systems Eiman Ebrahimi Chang Joo Lee Onur Mutlu Yale N Patt HPS Research Group The University of Texas at Austin ID: 414904
Download Presentation The PPT/PDF document "Prefetch-Aware" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Prefetch-Aware
Shared-Resource Managementfor Multi-Core Systems
Eiman Ebrahimi*Chang Joo Lee*+Onur Mutlu‡Yale N. Patt*
* HPS Research Group The University of Texas at Austin
‡ Computer Architecture LaboratoryCarnegie Mellon University
+
Intel Corporation
AustinSlide2
2
Background and Problem
Core 0Core 1Core 2Core NShared CacheMemory ControllerDRAMBank 0DRAM
Bank 1DRAM Bank 2
...
DRAM
Bank
K
...
Shared Memory
Resources
Chip Boundary
On-chip
Off-chip
2
Core 0 Prefetcher
Core N Prefetcher
...
...Slide3
Background and Problem
Understand the impact of prefetching on previously proposed shared resource management techniques3Slide4
Background and Problem
Understand the impact of prefetching on previously proposed shared resource management techniquesFair cache management techniquesFair memory controllersFair management of on-chip inteconnectFair management of multiple shared resources4Slide5
Background and Problem
Understand the impact of prefetching on previously proposed shared resource management techniquesFair cache management techniquesFair memory controllersNetwork Fair Queuing (Nesbit et. al. MICRO’06)Parallelism Aware Batch Scheduling (Mutlu et. al. ISCA’08)Fair management of on-chip interconnectFair management of multiple shared resourcesFairness via Source Throttling (Ebrahimi et. al., ASPLOS’10)
5Slide6
Background and Problem
6Fair memory scheduling technique: Network Fair Queuing (NFQ)
Improves fairness and performance with no prefetchingSignificant degradation of performance and fairness in the presence of prefetchingNo PrefetchingAggressive Stream PrefetchingSlide7
Background and Problem
Understanding the impact of prefetching on previously proposed shared resource management techniquesFair cache management techniquesFair memory controllersFair management of on-chip inteconnectFair management of multiple shared resourcesGoal: Devise general mechanisms for taking into account prefetch requests in
fairness techniques7Slide8
Background and ProblemPrior work addresses inter-application interference caused by prefetches
Hierarchical Prefetcher Aggressiveness Control (Ebrahimi et. al., MICRO’09)Dynamically detects interference caused by prefetches and throttles down overly aggressive prefetchersEven with controlled prefetching, fairness techniques should be made prefetch-aware 8Slide9
Outline
Problem StatementMotivation for Special Treatment of PrefetchesPrefetch-Aware Shared Resource ManagementEvaluationConclusion9Slide10
Parallelism-Aware Batch Scheduling (PAR-BS) [
Mutlu & Moscibroda ISCA’08]Principle 1: Parallelism-awarenessSchedules requests from each thread to different banks back to backPreserves each thread’s bank parallelismPrinciple 2: Request BatchingMarks a fixed number of oldest requests from each thread
to form a “batch”Eliminates starvation & provides fairness10Bank 0Bank 1T1T1T0
T0
T2T2
T3
T3
T3
T2
T2
Batch
T0
T1
T1Slide11
Impact of Prefetching onParallelism-Aware Batch Scheduling
Policy (a): Include prefetches and demands alike when generating a batchPolicy (b): Prefetches are not included alongside demands when generating a batch11Slide12
Impact of Prefetching on
Parallelism-Aware Batch Scheduling
12Bank 1Bank 2Bank 1Bank 2Policy (a) Mark Prefetches in PAR-BSPolicy (b) Don’t Mark Prefetches in PAR-BS
P1
D1D2
P2
P1
P1
D2
D2
P2
Service Order
P1
D1
D2
P2
P1
P1
D2
D2
P2
DRAM
Bank 1
Bank 2
Core 1
Core 2
P1
D1
D2
P2
P1
P1
D2
D2
P2
Compute
Compute
Hit
P2
Hit
P2
Service Order
Bank 1
Bank 2
Core 1
Core 2
P1
D1
D2
P2
P1
P1
D2
D2
P2
Compute
Compute
Miss
Miss
P1
D1
D2
P2
P1
P1
D2
D2
P2
Saved Cycles
Saved
Cycles
Accurate Prefetch
Inaccurate Prefetch
Accurate Prefetches
Too Late
Stall
Stall
C
C
Stall
C
C
Stall
Stall
Stall
Batch
BatchSlide13
Impact of Prefetching on Parallelism-Aware Batch Scheduling
Policy (a): Include prefetches and demands alike when generating a batchPros: Accurate prefetches will be more timelyCons: Inaccurate prefetches from one thread can unfairly delay demands and accurate prefetches of othersPolicy (b): Prefetches are not included alongside demands when generating a batchPros: Inaccurate prefetches can not unfairly delay demands of other coresCons: Accurate prefetches will be less timely
Less performance benefit from prefetching13Slide14
Outline
Problem StatementMotivation for Special Treatment of PrefetchesPrefetch-Aware Shared Resource ManagementEvaluationConclusion14Slide15
Prefetch-Aware Shared Resource Management
Three key ideas:Fair memory controllers: Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracyFairness via source-throttling technique:Coordinate core and prefetcher throttling decisionsDemand boosting
for memory non-intensive applications15Slide16
Prefetch-Aware Shared Resource Management
Three key ideas:Fair memory controllers: Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracyFairness via source-throttling technique:Coordinate core and prefetcher throttling decisions
Demand boosting for memory non-intensive applications16Slide17
Batch
Prefetch-aware PARBS (P-PARBS)17
Bank 1Bank 2P1D1D2
P2
P1
P1
D2
D2
P2
Service Order
DRAM
Bank 1
Bank 2
Core 1
Core 2
P1
D1
D2
P2
P1
P1
D2
D2
P2
Compute
Compute
Hit
P2
Hit
P2
Accurate Prefetch
Inaccurate Prefetch
Stall
C
C
Stall
Policy (a) Mark Prefetches in PAR-BSSlide18
Batch
Prefetch-aware PARBS (P-PARBS)
18Bank 1Bank 2Policy (b) Don’t Mark Prefetches in PAR-BSP1D1D2P2P1
P1D2
P2Service Order
Bank 1
Bank 2
Core 1
Core 2
P1
D1
D2
P2
P1
P1
D2
D2
P2
Compute
Compute
Miss
Miss
D2
Saved
Cycles
Stall
Stall
C
C
Stall
Stall
Bank 1
Bank 2
Our Policy: Mark Accurate Prefetches
P1
D1
D2
P2
P1
P1
D2
D2
P2
Service Order
DRAM
Bank 1
Bank 2
Core 1
Core 2
P1
D1
D2
P2
P1
P1
D2
D2
P2
Compute
Compute
Hit
P2
Hit
P2
Accurate Prefetch
Inaccurate Prefetch
Stall
C
C
Stall
Batch
Accurate Prefetches
Too Late
Underlying prioritization policies need to distinguish between prefetches based on accuracySlide19
Prefetch-Aware Shared Resource Management
Three key ideas:Fair memory controllers: Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracyFairness via source-throttling technique:Coordinate core and prefetcher throttling decisions
Demand boosting for memory non-intensive applications19Slide20
Bank 1
Bank 2
Serviced First
Serviced Last
Service
Order
No Demand Boosting
With Demand Boosting
Core1 Dem
Core2 Dem
Legend:
Core2
Pref
Core 1 is memory
non-intensive
Core 2 is memory
intensive
Core1 Dem
Core2 Dem
Legend:
Core2
Pref
Core 1 is memory
non-intensive
Core 2 is memory
intensive
Bank 1
Bank 2
Demand boosting eliminates starvation of memory non-intensive applicationsSlide21
Prefetch-Aware Shared Resource Management
Three key ideas:Fair memory controllers: Extend underlying prioritization policies to distinguish between prefetches based on prefetch accuracyFairness via source-throttling technique:
Coordinate core and prefetcher throttling decisionsDemand boosting for memory non-intensive applications21Slide22
Outline
Problem StatementMotivation for Special Treatment of PrefetchesPrefetch-Aware Shared Resource ManagementEvaluationConclusion22Slide23
Evaluation Methodology
x86 cycle accurate simulatorBaseline processor configurationPer-core4-wide issue, out-of-order, 256 entry ROBShared (4-core system)128 MSHRs2MB, 16-way L2 cacheMain MemoryDDR3 1333 MHzLatency of 15ns per command (tRP, tRCD, CL)8B wide core to memory bus
23Slide24
System Performance Results
24
11%
10.9%
11.3%Slide25
Max Slowdown Results
25
9.9%18.4%14.5%Slide26
Conclusion
State-of-the-art fair shared resource management techniques can be harmful in the presence of prefetchingTheir underlying prioritization techniques need to be extended to differentiate prefetches based on accuracyCore and prefetcher throttling should be coordinated with source-based resource management techniquesDemand boosting eliminates starvation ofmemory non-intensive applications
Our mechanisms improve both fair memory schedulers and source throttling in both system performance and fairness by >10%26Slide27
Prefetch-Aware
Shared-Resource Managementfor Multi-Core Systems
Eiman Ebrahimi*Chang Joo Lee*+Onur Mutlu‡Yale N. Patt** HPS Research Group
The University of Texas at Austin‡ Computer Architecture Laboratory
Carnegie Mellon University+ Intel Corporation
Austin