/
We estimate an application’s We estimate an application’s

We estimate an application’s - PowerPoint Presentation

nonhurmer
nonhurmer . @nonhurmer
Follow
349 views
Uploaded On 2020-07-03

We estimate an application’s - PPT Presentation

virtual execution time as the duration for which the application should have run using all the resources on the chip in order to execute the same number of instructions as it did in the actual execution ID: 794832

tapestry performance prefetcher cache performance tapestry cache prefetcher applications shadow ptables bandwidth higher application virtual throughput manycore lru iaas

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "We estimate an application’s" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

We estimate an application’s

virtual execution time as the duration for which the application should have run using all the resources on the chip in order to execute the same number of instructions as it did in the actual execution.

Tapestry: Reducing Interference on

Manycore Processors for IaaS Clouds

Dynamically Partitioned

Manycore Architecture

Online PerformanceEstimation (pTables)

Higher-OrderDecisions

1. Dynamically Partitioned

Manycore

Architecture > CacheBlocs – Scalable Non-associative Cache Partitioning > Prefetcher Throttling – Efficient Bandwidth Utilization2. Online Performance Estimation (pTables) > Flattened Partial LRU Vector (FPLV) – Shadow Cache Statistics > Shadow Prefetcher – Shadow Prefetcher Statistics3. Higher-Order Decisions > Virtual Time Metering (VTM) – Accurately charge customers > Simultaneous Performance Optimization Table (SPOT) – Maximize throughput while maintaining fairness

Novel Microarchitectural Components in Tapestry

Basic Problem: Excessive resource interference hinders adoption of Manycores in IaaS Clouds

Our Approach: Tapestry

Anshuman Gupta

and Michael Bedford Taylor CSE Department, University of California at San Diego

I. While existing techniques can overcharge consumers by as much as 12x, Tapestry does fair metering.

Overview

Results

The Details

III. Tapestry improves overall-throughput by as much as 1.8x.

IV

.

With

increasing application

load, Tapestry provides progressively better overall-throughput as well as

worstcase

-

performance.

Area and Energy Costs in Tapestry

Manycore

Architectures have many potential benefits forInfrastructure-as-a-Service (IaaS) Clouds

Virtual Time Metering (VTM)

Simultaneous Performance Optimization Table (SPOT)

Fair Slowdown Metric, an approximate geometric mean of application virtual times, when maximized increases throughput while maintaining fairness.

Performance Tables (pTables) store the performance estimates for all applications for a spectrum of allocations of the last level cache and memory bandwidth.

Flattened Partial LRU Vector (FPLV)

To determine the shadow prefetcher statistics, we run the prefetching algo without actually prefetching data.

Shadow Prefetcher

Shadow caching for DSP requires tracking the LRU orders for different cache sizes. We efficiently maintain all these LRU orders using a topologically sorted vector.

CacheBlocs

Our cache uses Dynamic Set Partitioning (DSP) and Tag-Indirect Cache Addressing (TICA) for scalable cache partitioning.

Prefetcher Throttling

With dynamic bandwidth allocation, we change prefetcher aggression to maximize bandwidth utilization.

Tapestry is a distributed manycore architecture with dynamically partitioned last-level cache and memory bandwidth, shared between many applications.

II. Slowdowns are imminent, but Tapestry improves

worstcase

-

performance by as much as 1.6x.

Power – High performance per watt

Space – High performance per rack

Hardware – Low cost per

p

rocessing core

How much to charge the consumers, also called metering?

How to minimize the slowdowns of concurrent applications?How to maximize the throughput of co-located applications?Can we handle higher application load on a single processor?

Dynamically partition the critical shared resources.Estimate application performances for all resource allocations.Use the performance estimates to make higher order decisions.

To calculate pTables, we use an online analytical performance model that uses cache and prefetcher statistics for all configurations.

We use a dynamic algorithm in hardware to find the resource distribution that will maximize the Fair Slowdown Metric.

We charge the consumer for using the entire chip for this estimated virtual time.

Dynamic resource sharing leads to unpredictable slowdowns for applications.

Interference gets

worse with

increasing number of concurrent applications

.

Interference

leads to

Higher-Order Problems for

IaaS

Clouds

We were able to estimate the pTables with an error of just about 1%.Using CacheBlocs we were able to reduce power consumption in partitioned caches by 67% .We were able to approximately track the pareto optimal curve for prefetcher performance with our throttling.

Additional Results