/
Protecting Host Systems from Imperfect Hardware Accelerators Protecting Host Systems from Imperfect Hardware Accelerators

Protecting Host Systems from Imperfect Hardware Accelerators - PowerPoint Presentation

tawny-fly
tawny-fly . @tawny-fly
Follow
377 views
Uploaded On 2018-02-26

Protecting Host Systems from Imperfect Hardware Accelerators - PPT Presentation

Lena E Olson PhD Final Defense August 17 th 2016 2 Accelerators are increasingly popular Good for performance energyefficiency programmability exciting new applications What can we do if theyre ID: 636203

accelerator addr host state addr accelerator state host memory control path border cache accel data tlb req caches translation

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Protecting Host Systems from Imperfect H..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Protecting Host Systems from Imperfect Hardware Accelerators

Lena E. Olson

PhD Final Defense

August 17

th

, 2016Slide2

2

Accelerators

are increasingly popular…

Good for performance, energy-efficiency, programmability, exciting new applications….

What can we do if they’re

imperfect

?Slide3

Executive Summary

Motivation

What are accelerators?Why are they popular?What does

imperfect

mean

?

Accelerator Security TaxonomyDefine

threat landscape

Anticipate threats

rather than fixing one by one

3Slide4

Executive Summary

Border Control

Protects host memory

from stray reads/writes

Crossing Guard

Protects host from

coherence

protocol violations

Eases accelerator development

4Slide5

Overview

5

Motivation

I. Accelerator Security Taxonomy

III. Crossing

Guard

II. Border ControlSlide6

What is an accelerator?

Broadly:

Specialized hardware

that can perform a

subset of computation tasks

with higher performance and/or

lower energy

than a CPU

6Slide7

Types of Accelerators

7

A9, from www.chipworks.com

SoCs

, soft-IP accelerators

FPGA accelerators

IBM CAPI

CCIXSlide8

Example Accelerators

8

Lots of (GP)GPU papers!Slide9

However…

What if accelerator hardware is

imperfect?

Due to

bugs

?

Due to

malicious

design?

9Slide10

Overview

10

Motivation

I. Accelerator Security Taxonomy

III. Crossing

Guard

II. Border ControlSlide11

I. Accelerator Security Taxonomy

Security Implications of Third-Party Accelerators*

Lena E. Olson,

Simha

Sethumadhavan

, Mark D. Hill11

*CAL, June 2016Slide12

Motivating Example: GPU leaks

Guess

which website left this data in the GPU texture memory?

12

“Stealing Webpages Rendered on Your Browser by Exploiting GPU Vulnerabilities”, Lee et al. (Oakland ’14)Slide13

Why a taxonomy?

Could discover and fix threats one by one

Hard to patch existing hardwareDoesn’t fix root problem

Taxonomy provides a

framework

What classes of threats are there?

Where are they coming from?How to prevent them?13Slide14

Threat Scope

Accelerator ScopeOnly affect processes running on accelerator

Example: GPU leaks data between processes

Challenge

: Cannot fix accelerator internals

Defense

: Don’t run sensitive process on untrusted acceleratorSystem ScopeCan affect processes

not

running on accelerator

Example: Bad access to system memory

Challenge: Affects unrelated processes!Defense

: Good system / interface design14Slide15

Security: CIA model

3 considerations for securityConfidentiality

: Can someone steal data?Integrity: Do we get the right answer?Availability

: Can we use the resource?

Integrity & Availability also important for reliability!

15Slide16

Accelerator Risk Categories

16

C

onfiguration, Computation, Termination

A

ccess to {accelerator, host} memory

M

icroarchitectural

commands, Exceptions/interrupts

P

owerSlide17

Threat Matrix: Accelerator Scope

17

Known exploit

Confidentiality

Integrity

Availability

Configuration

Side-channel,

kleptography

Kleptography

, wrong output

Lock up accelerator

Computation

Side-channel,

kleptography

Kleptography, wrong output

Lock

up accelerator

Termination

Failure

to clear registers / memory / cache

Stale

data in registers / memory/ cache

Fail to release resources

Accel

. Memory

Bad access

Bad access

Evict others

System Memory

Side-channel

µarch Commands

Inconsistent (stale) data

Exceptions

Side-channel

Power

Power

analysis attacks

Excessive heat

 Unreliability

Excessive heat

 damageSlide18

Threat Matrix: System Scope

18

Confidentiality

Integrity

Availability

Configuration

Incorrect registers (e.g. CR3)

Incorrect

registers

Computation

Termination

Stale

translations

Stale

translations

Fail to release resources

Accel

. Memory

System Memory

Bad access

Bad access

Saturate bandwidth, cause swapping

µarch Commands

Snoop

on coherence traffic

; ignored invalidations

Ignore invalidations

Excessive

/ ignored coherence requests

Exceptions

Spurious exceptions / interrupts

Power

Excessive

heat

Excessive heatSlide19

Example Defenses

Reset accelerator upon terminationLimits performance; non-volatile memory?

ARM TrustZone

Coarse-grained: trusted vs. untrusted

Protection at interfaces

19Slide20

Our Focus

Accelerators thatS

hare unified virtual memory with host

Share

unified physical memory

with host

May participate in coherence with host…but, which are less trusted than the CPUOr, which don’t need full access to everything!

If compromised, can affect the

host memory

, not just processes running on accelerator!

20Slide21

Two Memory Access Threats

Accesses to invalid addressesWild writes

Reads to sensitive dataEffectively, allow full access to host system!

Our solution:

II. Border Control

Incorrect accelerator coherence protocols

Incorrect messages

Deadlocks

Denial of service attacks

Our solution:

III. Crossing Guard

21Slide22

Overview

22

Motivation

I. Accelerator Security Taxonomy

III. Crossing

Guard

II. Border ControlSlide23

II. Border Control

Border Control: Sandboxing Accelerators*

Lena E. Olson, Jason Power, Mark D.

Hill, David A. Wood

23

*MICRO, December 2015Slide24

Threat Model

Protect host from incorrect or malicious accelerators that could perform

stray reads

, violating

confidentiality

s

tray writes, violating integrity

of host processes that do and do NOT run on the accelerator

24

Question: Which accesses are stray?Slide25

Principle of Least Privilege

Every

program and every user of the system should operate using the least

set of

privileges necessary

to complete the job. Primarily, this principle limits the damage that can result from an accident or error

.

Jerome Saltzer

25

h

ardware component

Border Control AuthorsSlide26

Accelerator Access Permissions

What permissions should an accelerator have?NOT to OS data

NOT to sensitive data from other processesPrinciple of Least Privilege:

to what it

needs

Access to addresses corresponding to process it is currently running

These can be found in the page tableWe will use page permissions (like prior work)

26Slide27

Example System

27

CPU

$$

Accel.

Trusted data path

Untrusted data path

$$

Memory or Shared LLC

Address translation?

MMU

TLB

Address translation path

Translation update path

Security?

Accel.Slide28

Full IOMMU

28

CPU

$$

Accel.

Accel.

Trusted data path

Untrusted data path

$$

Memory or Shared LLC

Full IOMMU

MMU

TLB

Address translation path

Translation update pathSlide29

IOMMU’s Address Translation Service (ATS) translates every memory reference to host

+ Protection

- Translation latency

- Bandwidth

- Synonyms in virtual caches?

- Coherence?

Can add (physical) caches and TLB…

Full IOMMU Challenges

29Slide30

Bypassable IOMMU (Baseline)

30

CPU

$$

Trusted data path

Untrusted data path

Memory or Shared LLC

$$

MMU

TLB

TLB

Accel.

$$

TLB

Accel.

$$

IOMMU

Address translation path

Translation update path

OS

Memory (Q)

Process

Memory (P)

Mem req:Virtual addr = VMem req:Phys. addr = PSlide31

Bypassable IOMMU (Baseline)

31

CPU

$$

Trusted data path

Untrusted data path

Memory or Shared LLC

$$

MMU

TLB

TLB

Accel.

$$

TLB

Accel.

$$

IOMMU

Address translation path

Translation update path

OS

Memory (Q)

Process

Memory (P)

Mem req:Virtual addr = VMem req:Phys. addr = PMem req:Phys. addr = QSlide32

We can’t remove the caches and TLBsToo slow!

Why not use trusted design for caches and TLBs?

So… caches are the problem?

32Slide33

CAPI-like

33

CPU

$$

Trusted data path

Untrusted data path

Memory or Shared LLC

$$

MMU

TLB

TLB

Accel.

$$

TLB

Accel.

$$

IOMMU

Address translation path

Translation update path

OS

Memory (Q)

Process

Memory (P)

Cache access latency?Slide34

Summary Comparison

Full

IOMMU

Bypassable

IOMMU

CAPI-

like

TLB

+

Caches?

No

YesSlow

Customizable Caches?NoYes

No

Safe?YesNo

Yes34Slide35

Border Control

35

CPU

$$

Trusted data path

Untrusted data path

$$

MMU

TLB

TLB

Accel.

$$

TLB

Accel.

$$

IOMMU

Address translation path

Translation update path

Memory or Shared LLC

OS

Memory (Q)

Process

Memory (P)

Border ControlBorder ControlSlide36

Border Control

36

CPU

$$

Trusted data path

Untrusted data path

Memory or Shared LLC

$$

MMU

TLB

TLB

Accel.

$$

TLB

Accel.

$$

IOMMU

Address translation path

Translation update path

OS

Memory (Q)

Process

Memory (P)

Border ControlBorder Control

Mem req:Phys. addr = PMem req:Virtual addr = VMem req:Phys. addr = PSlide37

Border Control

37

CPU

$$

Trusted data path

Untrusted data path

Memory or Shared LLC

$$

MMU

TLB

TLB

Accel.

$$

TLB

Accel.

$$

IOMMU

Address translation path

Translation update path

OS

Memory (Q)

Process

Memory (P)

Border ControlBorder Control

Mem req:Phys. addr = QSlide38

Border Control: Implementation

One Border Control instance per accelerator

Protection TableIn system memoryContains all needed permissions by PPN

Sufficient for correct design

0.006% physical memory overhead

Border Control Cache (BCC)

Caches recent permissionsA 64 byte entry covers 512 4KB pages38Slide39

Protection Table Design

Flat physically indexed table in memory

39

2 bits (R/W) per physical page

Initialized to 0 (no permission)

Lazily updated on

IOMMU

translation

Checked on all accelerator

memory

requests

●●●

PPNR

W

00

0111210

300N-400

N-31

0

N-21

0N-100

What about execute permission?Slide40

Summary Comparison

Full

IOMMU

Bypassable

IOMMU

CAPI-

like

Border Control

TLB

+

Caches?

NoYesSlow

YesCustomizable Caches?No

Yes

NoYes

Safe?YesNoYesYes40

EVALUATIONGPGPU  accelerator safety stress-testgem5-gpuRodinia BenchmarksSlide41

Border Control Overheads

41

Takeaway:

On average 0.48% performance overhead vs. unsafe

Moderately-Threaded

GPUSlide42

II. Border Control Summary

Bad addresses blocked: check

!2 bits / (4KB) page =

0.006%

space overhead

Could be optimized further

On average, 0.48% (moderately threaded) performance overheadWhat about bad coherence messages

?

42Slide43

Overview

43

Motivation

I. Accelerator Security Taxonomy

III. Crossing Guard

II. Border ControlSlide44

III. Crossing Guard

Mediating Host-Accelerator Coherence Interactions*

Lena E. Olson, Mark D.

Hill, David A. Wood

44

*Currently under submissionSlide45

Threat Model

Protect host from incorrect or malicious accelerators that could perform

stray reads

, violating

confidentiality

s

tray writes, violating integrityincorrect coherence activity

,

violating

availability

of host processes that do and do NOT run on the accelerator

45Slide46

Crossing Guard Goals

Allow accelerators

customized caches

Simple, standardized

coherence interface

Work with many diverse host protocols

Provide safety for the host systemNo unexpected messagesNo deadlocks

46Slide47

1. Why Customize Caches?

CPU caches have to work with all workloads

Accelerators may only run some workloads

!

Streaming?

More prefetching.GPGPUs? Relax coherence between GPU cores.Etc….

47Slide48

2. Why Simple Interface?

Redesigning for each host is too much workIntel, AMD, ARM, IBM, Oracle…

CCIX shows companies care!Host protocols may be

proprietary

Host protocols are

complex

!48Slide49

2. Why Simple Interface?

49

(Transition table in style of

Sorin

et al.)Slide50

Addr

State

A S

3. Why Host Safety?

50

Addr

State

Owner/Sharers

Req

A SS 1, 2 -

Addr

State

A I

Addr

State

A IDirectoryAccel Cache (#0)Cache #1Cache #2

AccelCPU

CPUSlide51

Addr

State

A S

3. Why Host Safety?

51

Addr

State

Owner/Sharers

Req

A SS 1, 2 -

Addr

State

A I

Ack

Addr

StateA IDirectoryAccel Cache (#0)

Cache #1Cache #2

? ? ?? ? ?Slide52

Addr

State

A I

3. Why Host Safety?

52

Addr

State

Owner/Sharers

Req

A MT 0 -

Addr

State

A

M

Addr StateA IDirectoryAccel Cache (#0)Cache #1Cache #2

Inv

Req: dir

Addr

State Owner/Sharers Req

A MT_I 0 -Slide53

Crossing Guard Overview

Hardware implemented in trusted hostImplements simple, standard

interfacecomplex enough to allow hierarchical protocolworks

with range of host

protocols

safe for host

maintains Border Control protectionsMoves protocol complexity into XG hardwareOnly implemented once per host systemBy experts!

53Slide54

1. Customize Caches

Designed + implemented two sample systems

54

Accel

L1

Accel

L1

Accel

L1

CPU L1

CPU L1

Host

Directory / L2

XG

XG

XG

Private Per-Core L1 at AcceleratorSlide55

1. Customize Caches

Designed + implemented two sample systems

55

Accel

L1

Accel

L1

Accel

L1

CPU L1

CPU L1

Host

Directory / L2

XG

Private L1s + Shared L2 at Accelerator

Accel

L2Slide56

2. Simple Interface

Accelerator  Host Requests

GetS

,

GetM

PutS, PutE, PutM

Host

 Accelerator Responses

DataS

,

DataE, DataMWriteback Ack

56

Host  Accelerator Requests

Invalidate

Accelerator

 Host Responses

InvAck

, Clean

Writeback

,

Dirty Writeback Slide57

2. Simple Interface

57

Single-level Accelerator Cache using Crossing Guard InterfaceSlide58

2. Simple Interface

Implemented Crossing Guard interface to two host protocolsAMD Hammer-like Exclusive MOESI

MESI InclusiveModularity: Host and Accelerator protocol choice is independent

58Slide59

Addr

State

Acks

Reqs

Timer

A

I 0 - 0

Addr

State

Acks

Reqs

Timer

A

IM 0 - 0

Addr State Acks Reqs TimerA SM -2 - 0

Addr State Acks

Reqs TimerA SM -1 - 0

Addr

State Acks Reqs TimerA M 0 - 0

Addr StateA I2. Simple Interface59Addr State Owner/Sharers Req A SS 1, 2 -Addr StateA IAddr State

A SAddr State

A BGetMGetMAddr State Owner/Sharers Req A SM_MB 1, 2 0

InvReq: 0AckDataAcks:-2Addr StateA I

Ack

DataM

Addr

State

A

M

Directory

Accel

Cache

Cache #1

Cache #2

Cache #0

UnblockM

Addr

State

Owner/Sharers

Req

A M 0 -Slide60

Addr

State

Acks

Reqs

Timer

A

I 0 - 0

Addr

State

Acks

Reqs

Timer

A

IM 0 - 0

Addr State Acks Reqs TimerA SM -2 - 0

Addr State Acks

Reqs TimerA SM -1 - 0

Addr

State Acks Reqs TimerA M 0 - 0

Addr StateA I2. Simple Interface60Addr State Owner/Sharers Req A SS 1, 2 -Addr StateA IAddr State

A SAddr State

A IMGetMGetMAddr State Owner/Sharers Req A SM_MB 1, 2 0

AckDataAcks:-2Addr StateA IAckDataM

Addr

State

A

M

Directory

Accel

Cache

Cache #1

Cache #2

Cache #0

UnblockM

Addr

State

Owner/Sharers

Req

A M 0 -Slide61

Addr

State

Acks

Reqs

Timer

A

I 0 - 0

Addr

State

A S

3. Host Safety

61

Addr

State

Owner/Sharers

Req A SS 1, 2 -Addr

StateA I

Ack

Addr StateA IDirectory

Accel CacheCache #1Cache #2Cache #0Slide62

Addr

State

Acks

Reqs

Timer

A

M 0 - 0

Addr

State

A S

3. Host Safety

62

Addr

State

Owner/Sharers

Req A MT 0 -Addr

StateA M

Addr StateA I

DirectoryAccel CacheCache #1

Cache #2Cache #0Inv(Req: dir

)Addr State Owner/Sharers Req A MT_I 0 -Addr State Acks Reqs TimerA MI 0 dir 1210InvTime: 200

Time: 210Time: 500

Time: 1000Time: 1500DataAddr State Acks Reqs TimerA I 0 - 1210

Addr State Owner/Sharers Req A WB 0 -Slide63

3. Host Safety

Crossing Guard Guarantees to Host:

Accelerator requests must be correct

Consistent with block

stable

state

Consistent with block transient state

Accelerator

responses

must be correct

Consistent with block stable

stateConsistent with block transient stateWithin a reasonable time

63

( + Border Control Protections!)Slide64

Crossing Guard Variants

Full State Crossing GuardInclusive directory of accelerator state

+ Places few restrictions on host protocol

+ Can hide all errors

- Requires tag + metadata storage for all blocks

Transactional Crossing Guard

Stores only data for in-flight transactions+ Small storage+ Provides most safety properties

- Requires some host tolerance

64Slide65

Evaluation

Does it provide coherence to correct accelerator?

Does it provide safety to host?

Does it allow high performance?

65Slide66

Correctness Testing

Are

coherence invariants are maintained when accelerator is acting

correctly

?

How? Random tester

Store-Load pairs to random addresses

Check integrity of data

Local coverage:

> 99%

66Slide67

Fuzz Testing

Is

host safety maintained when accelerator

misbehaves

?

How? Replace accelerator cache with evil controller

Generates random coherence messages to random addressesDesired outcome: No deadlocks / crashes

Local Coverage:

> 99.3%

67Slide68

Performance Testing

Tertiary concern, but cannot degrade performance too much

gem5-gpu

Rodinia

workloads

CAVEATS

:Immaturity of workloads / infrastructure

Directly comparing coherence protocols hard

General trends only!

68Slide69

Performance (Hammer-like)

69Slide70

Performance: MESI Inclusive

70Slide71

III. Crossing Guard Summary

Provides

simple, standardized interface to ease accelerator development

Correctness

when accelerator is correct

Host safety

when accelerator is incorrect

Low performance overhead

71Slide72

Overview

72

Motivation

I. Accelerator Security Taxonomy

III. Crossing Guard

II. Border ControlSlide73

Publications

“Crossing Guard: Mediating Host-Accelerator Coherence Interactions”

Olson, Hill, Wood (under submission)“Border Control: Sandboxing Accelerators” Olson, Power, Hill, Wood (MICRO 2015)

“Security Implications of Third-Party Accelerators”

Olson,

Sethumadhavan

, Hill (CAL 2016)“Probabilistic Directed Writebacks for Exclusive Caches”Olson, Hill (TR 2016)“Revisiting Stack Caches for Energy Efficiency”,Olson, Eckert, Manne, Hill (TR 2014)

73Slide74

Accelerators raise

new security questionsWe can design secure interfaces

To prevent bad memory accessesTo

prevent coherence bugs

To

ease accelerator development

At low overhead, so people might use them!

Conclusion

74Slide75

Questions?

75

Investigating Border Control at the Canada-USA Border

CANADA

No passportSlide76

Backup Follows

76Slide77

Why now?

Breakdown of Dennard Scaling3D Die Stacking

Cool new programming models like HSA, CAPI allow unified memory address space

Less copying data

Great for programmability!

Tight integration with host

77Slide78

Company Reputations

“Companies would never produce malicious hardware, their reputation would be ruined!”

78Slide79

Border Control Operation

79

Accel

TLB

Trusted data path

Untrusted data path

Address translation path

Translation update path

Memory

$$

Protection Table

Border Control

update

path

IOMMU

Border Control

BC CacheSlide80

Full IOMMU

Safe, but no caches (slow)Bypassable IOMMUH

as caches, TLB – very fast!Totally unsafeCAPI-like

Safe, and has caches and TLB…

B

ut longer access latency, less designer control

To summarize…80

Can we do better?Slide81

Full IOMMU

Safe, but no caches (slow)Bypassable IOMMUH

as caches, TLB – very fast!Totally unsafeCAPI-like

Safe, and has caches and TLB…

B

ut longer access latency, less designer control

Border ControlSafe, physical caches+TLB, AND fast

To summarize again…

81

EVALUATION

GPGPU

 accelerator safety stress-test

gem5-gpu

Rodinia BenchmarksSlide82

Simulation Parameters

82Slide83

Comparison of Configurations

83Slide84

Border Control Overheads

Highly-Threaded GPU

84

Takeaway:

On average 0.15% performance overhead vs. unsafeSlide85

Border Control Cache

85

Takeaway:

A small (1KB) BCC is sufficient for our workloadsSlide86

TLB Shootdown Steps

If page was read-only:

update entry in Protection Table and BCCIf page was read-write:

Invalidate entry in TLB

Flush dirty blocks from page in accelerator cache

Update entry in Protection Table and BCC

86Slide87

Border Control Flush Overhead

87

Takeaway:

Permission downgrades affect performance, but not muchSlide88

Information Flow Tracking

Goal: track untrusted information, prevent it from modifying sensitive data / control

e.g., prevent buffer overflow in softwareHardware-assisted techniques: prevent threats from bugs in software

(same address space) – different threat than Border Control

Hardware (e.g. Tiwari et al., ISCA 2011) – very powerful technique, but

high area/runtime overhead

and not transparent to software88Slide89

Mondriaan

Replacement for traditional page table + TLBAllows fine-grained permissions

Border Control is independent of the policy for deciding permissionsBut permission granularity might mean alternate Protection Table organizations are better

89Slide90

Single-Level Cache

90Slide91

Simulation Parameters

91Slide92

Time Spent Simulating (Random)

Configuration

Time

XG

Full + Hammer + 1 Level

5.28 years

XG Full +

Hamer + 2 Level

2.51 years

XG Full +

MESI Inc + 1 Level

133 daysXG Full + MESI Inc + 2 Level223 days

XG Trans. + Hammer + 1 Level3.17 yearsXG Trans. + Hammer + 2 Level1.38 years

XG Trans + Inc

+ 1 Level90 daysXG Trans + Inc + 2 Level103 days

TOTAL13.9 years92Slide93

Full Coverage %s (Random)

Full State XG

Single-level

Two-level

Hammer-like

99

99.8

MESI

Inclusive

100

99.4

Transactional XGSingle-level

Two-levelHammer-like99.399.5MESI Inclusive

100

99.793Slide94

Time Spent Simulating (Fuzz)

Configuration

Time

XG

Full + Hammer-like

1.62 years

XG Full + MESI

Inclusive

287days

XG Trans

actional + Hammer-like5.3 years

XG Transactional + MESI Inclusive41 daysTotal

7.82 years94Slide95

Full Coverage %s (Fuzz)

Full State Crossing Guard

Fuzz Tester

Hammer-like

99.3

MESI Inclusive

99.7

Transactional Crossing Guard

Fuzz

Tester

Hammer-like

99.7MESI Inclusive

10095Slide96

Performance: Hammer-like

96Slide97

Performance: MESI Inclusive

97Slide98

Addr

State

Acks

Reqs

Timer

A

I 0 - 0

Addr

State

A I

Template

98

Addr

State

Owner/Sharers

Req A SS 1, 2 -Addr

StateA I

GetMGetM

Addr State

A IAckDirectoryAccel Cache

Cache #1Cache #2Cache #0Slide99

Old Slides

99Slide100

3. Why Host Safety?

100

Accelerator cache

Directory

Addr A: ?

Addr A: RW

Addr A: Not Present in caches

? ? ?

? ? ?

Ack

Addr:

ASlide101

Directory

3. Why Host Safety?

101

Accelerator cache

Addr A: M

Addr A: RW

Addr A: M, owned by accelerator

Fwd-GetM

Addr:

ASlide102

Directory

Crossing Guard Example

102

Accelerator cache

Addr A: M

Addr A: RW

Addr A: M, owned by accelerator

A: waiting for WB

Writeback

Addr:

A

Fwd-GetM

Addr:

A

Invalidate

Addr:

ASlide103

Directory

Crossing Guard Example

103

Accelerator cache

Addr A: M

Addr A: RW

Addr A: M, owned by accelerator

A: waiting for WB

Invalidate

Addr:

A

Writeback

Addr:

A

Fwd-GetM

Addr:

ASlide104

Where to next?

104Slide105

What I’ve Learned

Anticipate questions, make backup slides =)

Talk to colleagues! They’re really smart.If you can’t explain why your idea is exciting, no one will care about it.

Be confident!

105