/
Containment Domains Containment Domains

Containment Domains - PowerPoint Presentation

tawny-fly
tawny-fly . @tawny-fly
Follow
456 views
Uploaded On 2016-02-27

Containment Domains - PPT Presentation

A Scalable Efficient and Flexible Resilience Scheme for Exascale Systems Jinsuk Chung Ikhwan Lee Michael Sullivan Jee Ho Ryoo Dong Wan Kim Doe Hyun Yoon Larry Kaplan ID: 233923

domain containment chung jinsuk containment domain jinsuk chung spmv preserve idle preservation rows task void detect parent cds recover prevc data matrix

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Containment Domains" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Containment DomainsA Scalable, Efficient, and Flexible Resilience Scheme for Exascale Systems

Jinsuk Chung, Ikhwan Lee, Michael Sullivan, Jee Ho Ryoo, Dong Wan Kim, Doe Hyun Yoon+, Larry Kaplan*, and Mattan ErezUT Austin, + now at HP Labs, * CraySlide2

Containment DomainsA Scalable, Efficient, and Flexible Resilience Scheme for Exascale SystemsSlide3

Motivation and goalsResilience bounds performanceResilience is a major obstacle to exascale

Containment domains: scalable efficient resilienceHierarchical Preserve data where most efficient and effectiveProportionalTunable redundancy and recoveryDifferent errors/faults handled differentlyAbstractPortableAmenable to auto-tuning and analysis3CDs elevate resilience to a first-order application concern

Containment Domain [SC'12] (c) Jinsuk ChungSlide4

Containment domainsSingle consistent abstraction Encapsulates resilience techniquesSpans levels: programming, system, and analysisComponents

Preserve data on domain startCompute (domain body)Detect faults before domain commitsRecover from detected errorsSemanticsErroneous data never communicated Each CD provides recovery mechanism HierarchyEscalationMatch CD and machine hierarchiesContainment Domain [SC'12] (c) Jinsuk Chung

4

Root CD

Child CDSlide5

Mapping example: SpMVvoid task<inner>

SpMV( in M, in Vi, out Ri){ forall(…) reduce(…) SpMV

(M[…],Vi[…],Ri[…]);

}void task<leaf>

SpMV(…){ for r=0..N for c=rowS[r]..rowS[r+1] {

res

i

[r]+=data[c]*V

i

[cIdx[c]];

prevC=c;

}

}

Containment Domain [SC'12] (c) Jinsuk Chung

5

 

Matrix

M

 

Vector

VSlide6

Mapping example: SpMVContainment Domain [SC'12] (c) Jinsuk Chung

6

 

 

 

 

Matrix

M

 

Vector

V

 

void task<inner>

SpMV

( in M, in V

i

, out

R

i

){

forall

(…) reduce(…)

SpMV

(M[…],V

i

[…],

R

i

[…]);

}

void task<leaf>

SpMV

(…){

for r=0..N

for c=rowS[r]..rowS[r+1] {

res

i

[r]+=data[c]*V

i

[cIdx[c]];

prevC=c;

}

}Slide7

Mapping example: SpMV7Containment Domain [SC'12] (c) Jinsuk Chung

 

 

 

 

 

 

 

 

 

 

 

 

Matrix

M

 

Vector

V

 

Distributed to 4 nodes

void task<inner>

SpMV

( in M, in V

i

, out

R

i

){

forall

(…) reduce(…)

SpMV

(M[…],V

i

[…],

R

i

[…]);

}

void task<leaf>

SpMV

(…){

for r=0..N

for c=rowS[r]..rowS[r+1] {

res

i

[r]+=data[c]*V

i

[cIdx[c]];

prevC=c;

}

}Slide8

Mapping example: SpMV8Containment Domain [SC'12] (c) Jinsuk Chung

 

 

 

 

Matrix

M

 

Vector

V

 

void task<inner> SpMV( in M, in V

i

, out R

i

){

forall(…) reduce(…)

SpMV(M[…],V

i

[…],R

i

[…]);

}

void task<leaf> SpMV(…){

for r=0..N

for c=rowS[r]..rowS[r+1] {

res

i

[r]+=data[c]*V

i

[cIdx[c]];

prevC=c;

}

}

Distributed to 4 nodesSlide9

Mapping example: SpMV9Containment Domain [SC'12] (c) Jinsuk Chung

 

 

Preserve

Detect

Recover

 

 

Preserve

Detect

Recover

 

 

Preserve

Detect

Recover

 

 

Preserve

Detect

Recover

Preserve

Detect

Recover

M

V

Parent CD

Child CD

Preserve (Parent)

Detect (Parent)

Recover (Parent)

Child

Detect

Recover

Child

Detect

Recover

Child

Detect

Recover

Child

Detect

RecoverSlide10

Initial CD preservation API and prototypevoid task<inner> SpMV

(in M, in Vi, out Ri) { cd = create_CD(parentCD);  preserve_via_copy

(cd, matrix, …); forall(…) reduce(…)

SpMV(M[…],Vi[…],

Ri[…]);

commit_CD(cd

);

}

void task<leaf>

SpMV

(…) {

cd

=

create_CD(parentCD

);

preserve_via_copy

(cd,

matrix

, …);

preserve_via_parent

(cd,

vec

i

, …);

for r=0..N

for c=rowS[r]..rowS[r+1] {

res

i

[r]+=data[c]*Vi[cIdx[c]]; check {fault<

fail>(c > prevC);}

prevC

=c;

}

commit_CD

(cd);

}

Containment Domain [SC'12] (c) Jinsuk Chung

10

Preservation components

prototype on Cray XK7

http

://lph.ece.utexas.edu/public/CDs

API

create_CDpreserve_via_copy

preserve_via_parentcheckcommit_CDSlide11

Containment domains long-term design Hardware Abstraction Layer

Runtime Library InterfaceMachine

efficiency-oriented

programming model

int

main(

int

argc

, char **

argv

)

{

main_task

here = phalanx::initialize(

argc

,

argv

);

… Create test arrays here …

//

Launch kernel on default CPU (“host”)

openmp_event

e1 =

async

(here,

here.processor

(), n)

(

host_saxpy

, 2.0f,

host_x

,

host_y

);

//

Launch kernel on default GPU (“device”)

cuda_event

e2 =

async

(here,

here.cuda_gpu

(0), n)

(

device_saxpy

, 2.0f,

device_x

,

device_y

);

wait(here, e1&e2);

return 0;

}

CD

Annotations

resilience model

Error Reporting Architecture

ECC, status

CD control and persistence

Language integration

Compiler support

Runtime components

Hardware aspects

CD

API

resilience interface

Research prototype by

Cray for XK7 (Titan)

Containment Domain [SC'12] (c) Jinsuk ChungSlide12

OutlineMotivation and GoalsSemantics of Containment DomainsWhat do CDs do? When and why are they good?

Differentiated error handlingAnalyzabilityEvaluation12Containment Domain [SC'12] (c) Jinsuk ChungSlide13

13Containment Domain [SC'12] (c) Jinsuk ChungDifferentiated Error Handling Slide14

AbstractOptimized preservation and restorationAnalyzed, auto-tuned Allows explicit application controlHierarchicalMatch storage hierarchy

Maximize locality and minimize overheadPartialPreserve only when worth itExploit natural redundancyExploit hierarchyEnable regenerationState preservation and restoration Containment Domain [SC'12] (c) Jinsuk Chung14Slide15

SpMV partial preservation tuning15Containment Domain [SC'12] (c) Jinsuk Chung

Natural redundancyvoid task<leaf> SpMV(…) { cd = create_CD(parentCD);

preserve_via_copy(cd, matrix, …); preserve_via_parent

(cd, veci, …); for r=0..N

for c=rowS[r]..rowS[r+1] { resi[r]+=data[c]*Vi

[cIdx[c]];

check

{

fault<fail>(c > prevC)

;}

prevC=c;

}

commit_CD(cd);

}

 

 

 

 

Matrix

M

 

Vector

V

 

Hierarchy

 

 

 

 

 

 

 

 Slide16

Concise abstraction for complex behavior

Containment Domain [SC'12] (c) Jinsuk Chung

16

void task<leaf>

SpMV

(…) {

cd =

create_CD

(

parentCD

);

preserve_via_copy

(cd,

matrix

, …);

preserve_via_parent

(cd,

vec

i

, …);

for r=0..N

for c=rowS[r]..rowS[r+1] {

res

i

[r]+=data[c]*V

i

[cIdx[c]];

check

{

fault<fail>(c > prevC)

;} prevC=c; } commit_CD(cd);}

Local copy or

regen

Sibling

Parent (unchanged)Slide17

DetectionAbstractUtilize most efficient detection mechanismLow overhead detection: e.g., algorithm specific detection CustomizedReplicate in time, replicate in space, algorithm specific

HeterogeneousPer-CD routinesE.g., selective multi-granularity DMRContainment Domain [SC'12] (c) Jinsuk Chung17Slide18

RecoveryAbstractUtilize most efficient recovery mechanismMaximize local recoveryLow overhead recovery e.g., re-materialization or regeneration

CustomizedRe-execute, ignore, re-materialize, DMR, TMRHeterogeneousPer-CD routinesE.g., selective multi-granularity DMRApp/system specificContainment Domain [SC'12] (c) Jinsuk Chung18

Compute

Preserve

Detect

Re-execution overhead

TimeSlide19

19Containment Domain [SC'12] (c) Jinsuk ChungAnalyzabilitySlide20

Leverage hierarchy and CD semantics

Uncoordinated “local” actionsSolve in  outApplication abstracted to CDsCD treeVolumes of preservation, computation, and communicationPreservation and recovery options per CDMachine modelStorage hierarchyCommunication hierarchyBandwidths and capacitiesError processes and rates

Analytical Model20

Containment Domain [SC'12] (c) Jinsuk ChungExecution timeSlide21

Power modelCDs that are not re-executing may remain idleActively executing a CD has a relative power of 1A node that is idling consumes a relative power of

In our experiments  21

IdleContainment Domain [SC'12] (c) Jinsuk Chung

Re-execution time

Parallel domains

Execution

Re-execution

Idle

Idle

Idle

Idle

Idle

Idle

Idle

Idle

Idle

Idle

Idle

Idle

Idle

Idle

IdleSlide22

EvaluationWhat we evaluatedPerformance efficiency Energy overheadBaseline resiliency approachesg-CPR: global checkpoint restarth-CPR: hierarchical checkpoint restart (e.g., SCR)

Optimum interval used for eachCD advantagesPreserve only what is neededHierarchical uncoordinated AssumptionsDetection overhead is assumed to be zeroCapacity of storage for preservation is infiniteInfinite spares (quick repair)22Containment Domain [SC'12] (c) Jinsuk ChungSlide23

Machine and error models23Containment Domain [SC'12] (c) Jinsuk Chung

Component“Performance”ErrorError ScalingCore10GFLOP/coreSoft error∝ #cores

Memory1GB/coreECC fail∝ #DRAM chips

Socket200GB/s /socketHard/OS crash∝ #sockets

SystemHierarchical networkPower module

or network

#

modules

and #cabinetsSlide24

WorkloadsMonte Carlo NTEmbarrassingly parallelInfrequent communicationSmall fraction of read/write data

Iterative hierarchical SpMVRecursive decompositionNatural redundancyFrequent global communicationMantevo HPCCGRequires little storageConjugate-gradient based linear system solverFrequent global communication24Containment Domain [SC'12] (c) Jinsuk ChungSlide25

Evaluation toolsSimulatorExecutes at granularity of containment domainsReexecutes when error is detectedUsed to validate the analytical modelAnalytical Model

Simulation is too slow for evaluating exascale systemsInputs to the model: extracted from each applicationVolume of preservation, restoration, computation and communicationError ratesShape of CD structureValidationSimulator and analytical modelPrototype of preservation/restoration on Cray XK725Containment Domain [SC'12] (c) Jinsuk ChungSlide26

26Containment Domain [SC'12] (c) Jinsuk Chung

Peak System Performance

NT

SpMV

HPCCG

Autotuned

CDs perform wellSlide27

27Containment Domain [SC'12] (c) Jinsuk Chung

Peak System PerformanceNTSpMVHPCCGAutotuned CDs perform wellSlide28

SPMV, HPCCG: local recovery and partial preservation28

Containment Domain [SC'12] (c) Jinsuk Chung

Disk

Remote NVM

Local NVM

DRAM

Partial preservation via sibling or parent where appropriateSlide29

NT: hierarchical local recovery and partial preservation29

Containment Domain [SC'12] (c) Jinsuk Chung

Disk

Remote NVM

Local NVM

DRAM

Partial preservation via sibling, parent, or regeneration where appropriateSlide30

30Containment Domain [SC'12] (c) Jinsuk Chung

Peak System PerformanceNTSpMVHPCCGAutotuned CDs perform wellSlide31

31

Containment Domain [SC'12] (c) Jinsuk ChungPeak System Performance

CDs improve energy efficiency at scale

NT

SpMV

HPCCGSlide32

32

Containment Domain [SC'12] (c) Jinsuk ChungPeak System Performance

CDs improve energy efficiency at scale

NT

SpMV

HPCCGSlide33

10X failure rate emphasizes CD benefits33Containment Domain [SC'12] (c) Jinsuk Chung

Peak Performance

Energy OverheadSlide34

More in the paperStrict vs. relaxed containment domainsAnalytical model detailsError and machine model detailsAdditional sensitivity studiesRelated work discussion

34Containment Domain [SC'12] (c) Jinsuk ChungSlide35

ConclusionContainment domains Abstract constructs for resilience concerns & techniquesProportional and application/machine tuned resilience

Hierarchical & distributed preservation, restoration, and recoveryAnalyzable and amendable to automatic optimizationScalable to large systems with high relative energy efficiencyHeterogeneous to match emerging architectureGood start and exciting work aheadPreservation concept prototyped on Cray XK7Fine-grained CDs for high error ratesCompiler optimizations and supportApplication-specific detection/elision PGAS support and interactions with system Interaction with other models (tasking, DSLs, …)

35

http://lph.ece.utexas.edu/public/CDsContainment Domain [SC'12] (c) Jinsuk ChungSlide36

Questions?Thank you

36Containment Domain [SC'12] (c) Jinsuk Chung