/
Procedia Computer Science   9  ( 2012 )  166 Procedia Computer Science   9  ( 2012 )  166

Procedia Computer Science 9 ( 2012 ) 166 - PDF document

obrien
obrien . @obrien
Follow
343 views
Uploaded On 2021-02-11

Procedia Computer Science 9 ( 2012 ) 166 - PPT Presentation

InternationalConferenceonComputationalScienceICCS2012FTAnAdaptiveFrameworkforFaultToleranceonLargeScaleSystemsusingApplicationMalleability Exascalesystemsofthefuturearepredictedtohavemeantimebetween ID: 830914

ckp procedia 2012 science procedia ckp science 2012 166 175 vadhiyar computer sathish george cijo lanl adft ftpro ipdps

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Procedia Computer Science 9 ( 2012 ) ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Procedia Computer Science 9 ( 2012 )
Procedia Computer Science 9 ( 2012 ) 166 – 175 InternationalConferenceonComputationalScience,ICCS2012FT:AnAdaptiveFrameworkforFaultToleranceonLargeScaleSystemsusingApplicationMalleabilityExascalesystemsofthefuturearepredictedtohavemeantimebetweenfailures(MTBF)oflessthanonehour.Available online at www.sciencedirect.com Cijo George and Sathish S. Vadhiyar / Procedia Computer Science 9 ( 2012 ) 166 – 175 applicationsasapplications,andtheactionofchangingthenumberofprocessorsofamalleableapplicationduringexecutionasreschedulingWiththedevelopmentofthesedierentfaulttolerancestrategies,theselectionofastrategyforapplicationex-ecutionhastobecarefullymadetomaximizetheapplicationperformanceinthepresenceoffailures.Dependingonfailurepredictionsandthecostofdierentstrategies,aruntimesystemmayhavetodynamicallyselectthemostectivefaulttolerancestrategyatagiveninstanceofapplicationexecution.Inthiswork,wehavedevelopedFT,anadaptivefaulttoleranceframeworkforlongrunningmalleableapplicationstomaximizeapplicationper-formanceinthepresenceoffailures.Wehavedevelopedcostmodelsthatconsiderdierentfactorslikeaccuracyoffailurepredictionsandapplicationscalability,forevaluatingthebene“tsofvariousfaulttoleranceactions.Ouradaptiveframeworkusesthecostmodelstomakeruntimedecisionsfordynamicallyselectingfaulttoleranceactionsatdierentpointsofapplicationexecutiontomaximizeperformance.Theprimaryfocusofourworkistoevaluatethebene“tsofmalleabilityforrealscienti“capplicationsonverylargescalesystemsandtodevelopaneectivestrategyforfaulttoleranceinfuturesystems.WhileAFTmakesuseofmalleabilityforbetterfaulttoleranceandperformanceformalleableapplications,itcanalsobeusedfornon-malleableapplicationsforadaptivefaulttolerance.Usingsimulations,weevaluateAFTintermsofworkdoneperunittimebytheapplicationinthepresenceoffailures.OurresultsshowthatAFTinvolvingmalleabilityoutperformsthepopularperiodiccheckpointingapproachbyatleast21%,andalsoyieldsupto23%higheramountofworkthanadynamicstrategy,calledFT-Proothatdoesnotinvolvemalleability.Ourresultsalsoshowthatouradaptivestrategyyieldshighperformanceevenforpetascalesystemsandbeyond,andthatapplicationmalleabilitywillbehighlyessentialforfutureexascalesystems.InSection2,wepresentrelatedeortsinfaulttolerancestrategies.Section3givestheoverallmethodologyofAFTframework.InSection4,wedescribeindetailthecostmodelforevaluatingthebene“tsofvariousfaulttolerancestrategies.Section5explainsthefaulttolerancesimulatorusedforevaluations.InSection6,wedescribeourevaluationmethodology,experimentswithrealandsynthetictracesandapplicationsonlargescalesystems,andgivesalientobservations.Section7givesasummaryofourworkandpresentsscopeforfuturework.2.RelatedWorkMostofthefaulttolerancemechanismsarebasedoncheckpointing[7,8].Recently,therehasbeenincreasinginterestinliveprocessmigration[9]duetoitsloweroverheadintransferringtheprocessimageswhencomparedtothehighcostofcheckpointing.Tohelparuntimesystemtousethesefaulttolerancemechanisms,failurepredictorshav

ebeendevelopedtopredictfailuresbasedonev
ebeendevelopedtopredictfailuresbasedoneventlogsofsystems,usingdataminingtechniques[13,14].Cappelloetal.[15]havealsoanalyzedfaulttoleranceforpostpetascalesystems.Theworkcomparesproactivemigrationwithproactivecheckpointingbasedonanalyticalperformancemodels.Theanalysisisbasedontheas-sumptionofhavingaperfectfailurepredictorwith100%accuracy.Whiletheirworkgivesoverallstatisticsusingtheassumption,ourworkperformsactualsimulationsofapplicationprogressinrealisticscenarioswithpredictionerrors.FT-Pro:TheworkbyLanandLi[12]hasalsodevelopedanadaptivefaultmanagementframeworksimilartothefocusofourwork.TheirFT-Proframeworkprovidesfaulttoleranceforapplicationsbyperformingproactivemigrationorcheckpointingbasedonacostmodel.However,theirworkcon“nestonon-malleableapplicationsthatexecuteona“xednumberofprocessorsthroughoutapplicationexecution.ThecostmodelofFT-Proisnotcapableoftakingadvantageofmalleabilityofapplicationstoprovidebetterfaulttolerance.Consideringmalleabilityinvolvesthefollowingsigni“cantchallengestodevelopingacostmodelandframework.Malleableapplicationscanrecoverinstantlyfromafailurebychangingthenumberofprocessors.Hencetherecanbemultiplefailure-rollbackcyclesinthesametimeinterval.FT-Proassumesasingleapplicationfailureinaninterval,whichmakesitunsuitableformalleableapplications.Formalleableapplications,thetimerequiredtocompleteagivenamountofworkdependsonthenumberofprocessorsused.FT-Proassumesthistimetobeaconstantforagivenamountofwork.Sincemalleableapplicationsexecuteondierentnumberofprocessorsduringexecution,theapplicationscala-bilityonvaryingnumberofprocessorshastobeconsideredinthecostmodel.FT-Prodoesnotsupporttheuseofapplicationscalability. Cijo George and Sathish S. Vadhiyar / Procedia Computer Science 9 ( 2012 ) 166 – 175 FTusesanentirelynewcostmodelthataddressesalltheabovechallenges.Moreover,FT-Proisevaluatedforshortrunningapplicationsusingstochasticmodelingforamaximumof192nodesandtracebasedsimulationsforamaximumof64nodes.OurevaluationofAFTismuchmorecomprehensive,usingrealtracesfromLANLfor512and1024nodesandalsousingsynthetictracesforverylargescalesystemsuptoexascale.Weusesyntheticscalabilitycurvesaswellasthescalabilitycurvesofreallongrunningapplicationsforthesimulations.FrameworkWeassumethepresenceofafailurepredictor[13,14]thatcanperiodicallyestimateexpectednodefailuresinthesysteminthenearfuture.Precision)ofsuchapredictorisde“nedastheratioofthenumberofcorrectpredictionstothetotalnumberofpredictionsmadeand)isde“nedastheratioofthenumberofcorrectpredictionstothetotalnumberoffailures.Thehigherthevaluesof,thebetterthepredictor.TheoverallworkingofourframeworkisillustratedinFigure1.AFTtakesruntimefaulttoleranceactionsatdecisionmakingpoints,denotedasadaptationpoints),suchthattheapplicationperformsconstantamountofwork,,betweentwoconsecutives.Followingarethepossibleactionsthatcanbetaken.,wherenoactionistaken.,wheretheapplicationtakesaproactivecheckpoint.Weassumecoordinatedcheckpointing,wheretheprocessessynchronizetoperformc

heckpointing.MIGRATE,wheretheprocesseson
heckpointing.MIGRATE,wheretheprocessesonfailure-pronenodesaremigratedtohealthynodes.Weassumethatlive-migrationmethod[9],whichdoesnotinvolvecheckpointing,isusedforthepurpose.,wheretheapplicationisrescheduledtoadierentsetofnodes,whichdoesnotincludeanyfailurepronenode(proactiverescheduling).Thisactioncanbetakenonlyiftheapplicationismalleable.4.ACostModelforApplicationExecutionbetweenAdaptationPointsFTusesacostmodelthattakesintoaccountfailurepredictionaccuracymetrics(precisionandrecall),operationcostsofthefaulttoleranceactions,numberofavailablenodesandapplicationscalabilitydatatoselectthebestfaulttoleranceactionateach.Applicationscalabilityisexpressedintheformofworkdoneperunittimeonvariousnumberofprocessors.AFTusesthisdatatocomputethefollowingtwovariablesinourcostmodel.):Thenumberofnodes,correspondingtomaximumworkdonebytheapplicationperunittime.):Thetimetakenbytheapplicationtoperformunitsofworkonnodes.Thisisobtainedbydividingwiththeworkdoneperunittimefornodesobtainedfromtheapplicationscalabilitydata.Ateach,thefailurepredictorforecaststheexpectednodefailuresforthenexttimeinterval,wheretheestimatedtimetocompleteamountofworkusingthecurrentworkingsetofnodes,,inafailure-freeenvironment(givenby)).AFTusesthecostmodeltocomputenext,theexpectedtimetocompletethenextamountofworkandthusreachthenext,foreachpossiblefaulttoleranceaction.Theactionwithminimumvalueofnextisselected.4.1.Illustration:CostModelfor3NodesWeassumethatmalleableapplicationscanberecoveredinstantlyfromnodefailuresbyreschedulingtoadisetofnodes(reactiverescheduling).Supposeat,thepredictorpredictsthatnodesarepronetofailuresinthenexttimeinterval.Theworstcaseinwhichthesenodescanfailisasfollows.failswhentheapplicationisabouttoreachfailsaftertheapplicationrecoversfromfailureofnodeandisabouttoreach Cijo George and Sathish S. Vadhiyar / Procedia Computer Science 9 ( 2012 ) 166 – 175 Figure1:AFTFrameworkScenarioProbabilityAllthreeofthenodesA,BandCfailP3AnytwoofA,BandCnodesfail3C2P2(1ŠP)AnyoneofA,BandCnodesfail3C1P1(1ŠP)2Noneofthenodesfails(1ŠP)3Table1:FailureScenarios(Example)failsaftertheapplicationrecoversfromfailureofnodeandisabouttoreachIfaactionwastakenat,thetotaltimetakentoreachinthisexamplecanbeexpressedasfollows.nextxtTreschrecoverloststTreschrecoverloststTreschrecoverlost))](1)Theapplicationtakes)timetoperformamountofworkonthecurrentworkingsetofofnodes,toreach.Atthispointoneofthe3nodesfailsandtheapplicationspendsreschtimeforreactivereschedulingandrecovertimeforrecoveringtheapplicationonthenewsetof()numberofnodes.Thenewsetofnodesisobtainedbyexcludingthenodethathasfailedandincludingnumberofsparenodes.Theapplicationthenspendslost))timetoreachusingthenewsetofnodes.Here,losttheworkdonebetweenthelastcheckpointand,calculatedas(currentckp,wherecurrentistheindexofthecurrentckpistheindexofthelatestwhereacheckpointwastaken.Whentheapplicationalmost,thesecondnodefailsandsimilarcostsareinvolvedtoreachagain.Thenthethirdnodefailsandtheprocessrepeats.Hencethe

totalcostisasgiveninEquation(1),whichcan
totalcostisasgiveninEquation(1),whichcanbesimpli“edandexpressedasfollows.nextreschrecoverlost))(2)Equation(2)givesthetimetakentoreachifaSKIPdecisionwastakenatandall3nodespredictedtofailactuallyfailsandintheworstpossibleway,whichisjustonepossiblescenario.Table1showsallthepossiblescenariosandtheircorrespondingprobabilitiesif3nodesarepredictedtofail.Notethattheprobabilitythatagivennodewhichispredictedtofailwillactuallyfailisequaltoprecision,,ofthepredictor.Ingeneral,theprobabilitythatnodesoutofthe3nodeswhicharepredictedtofailwillactuallyfailisgivenby.Nowtheestimatedcostofthedecisioninthegivenexampleconsideringallscenarioscanbeexpressedasgivenbelow.nextxtT(W,Nw)+i(TreschrecoverloststT(W,Nw)](3)Similarly,thecostmodelisdevelopedfordierentactionsfornumberofpredictednodefailures.4.2.AGeneralCostModelAteachFTcomputestheestimatedcostofeachofthepossibleactionsusingthecostmodelgivenbelowandtakestheactionthathastheleastestimatedvalueofnext Cijo George and Sathish S. Vadhiyar / Procedia Computer Science 9 ( 2012 ) 166 – 175 :Dependingonthenumberofnodesthatfail,theapplicationmayhavetoperformrollbackrecoveryseveraltimes.Ifnoneofthenodesfail,noextracostsareincurredandthetimetakentoreachthenextbethetimetakentocompleteamountofworkusingnodes.Hence,nextiscomputedasfollows.nextxtT(W,Nw)+i(TreschrecoverloststT(W,Nw)](4)Equation(4)isasimpleadaptationofEquation(3)fornumberofnodes.istheprobabilitythattheapplicationwillfailand(1istheprobabilitythattheapplicationwillnotfail.:Theapplicationspendssometimeforcheckpointingatthebeginningofthenextinterval.nextiscomputedasfollows.nextxtTckpreschrecoverecoverTckp)](5)Iftheapplicationfails,thecostinvolvedwillbethesumofthetimeforcheckpointing,ckp,thetimetoreachthenextadaptationpoint,),thecostofrescheduling(resch)andrecovery(recover)foreachofthenodefailuresandthetimetakentoredotheworktoreachthenextforeachofthenodefailures,)).Iftheapplicationdoesnotfail,thecostwillbethesumofckpandthetimetoreachthenextMIGRATE:Inthiscase,live-migrationisperformedatthebeginningofthenextinterval.Therearetwopossible1.if,i.e.thenumberofnodespredictedtofailislessthanthenumberofsparenodes,allfailurepronenodescanbemigratedtohealthysparenodesandhencefailureprobabilitywillbeZERO2.if,onlynumberoffailurepronenodescanbemigratedtohealthynodes.HencethereisstillafailureprobabilityinvolvingTheabovetwoconditionsaretakencareofbyde“ningavariable,,suchthatif0,elsenextisthencomputedasfollows.nextxtTmig+T(W,Nw)+i(TreschrecoverloststTmig+T(W,Nw)](6)Iftheapplicationfails,thecostinvolvedwillbethesumofthetimeformigration,,thetimetoreachthenextadaptationpoint,),thecostofrescheduling(resch)andrecovery(recover)foreachofthenode Cijo George and Sathish S. Vadhiyar / Procedia Computer Science 9 ( 2012 ) 166 – 175 failuresandthetimetakentoredotheworktoreachthenextforeachofthenodefailures,lost)).Thelattertimeincludeslostsincenocheckpointistakenforlive-migrationatthecurrent.Iftheapplicationdoesnotfail,thecostwillbethesumofandthetimetor

eachthenext:Here,theprobabilityofapplica
eachthenext:Here,theprobabilityofapplicationfailureisZEROsincetheapplicationisrescheduled,avoid-ingallfailurepronenodes.Hence,thecostinvolvesonlytheoverheadforreschedulingandthetimetakentoamountofworkusingthenewsetofnodes.nextiscomputedasfollows.nextckpreschrecover))(7)Theabovecostmodelreliesonprecision,ofthepredictor.But,forapredictorwitharecall,,oflessthan1,therecanalsobefailureswhicharenotpredicted.Totoleratesuchunforeseenfailures,aprecautionarycheckpointistakenwhenthetimesincelastcheckpointreachesathreshold.Foragivenvalue,thetimeintervalbetweensuchunpredictedfailurescanbeestimatedasMTBF.Thisvalueistakenasthethresholdforprecautionarycheckpointing.5.FailureSimulatorForevaluatingAFT,wehavedevelopedarobustdiscrete-eventfailuresimulatorthatcansimulateapplicationexecutioninthepresenceoffailures.Ittakesasinput,nodefailure-recoverytraceofasystem,accuracymetricsofthefailurepredictor,typeoffaulttolerancetobeadopted,applicationscalabilitydataandotherdataincludingcostofeachofthefaulttoleranceoperationsandestimatedMTBFofthesystem.Itsimulatesapplicationexecutionbasedonthegiveninputsandconsideringapplicationmalleability.Thesimulatoroutputstheworkdoneperunittimebytheapplicationattheendofthesimulation,alongwithotherdetailsoftheapplicationbehaviorinthepresenceoffailures.Intheabsenceofrealtraces,thetracegenerationcomponentofthesimulatorcangeneratesynthetictraceswithfailuretimesofdierentdistributionsincludingWeibullandExponentialdistributionsandrepairtimesofLog-normaldistributionforthesimulation.Thefailurepredictioncomponentinthesimulatorcantakeexpectedpredictoraccuracymetricsasinputandsimulatethebehaviorofafailurepredictorthatestimatesatregularintervalsoftime,thelistofnodesthatmightfailinthenextinterval,withthegivenaccuracymetrics.6.ExperimentsandResultsFTisevaluatedagainstFT-Proandperiodiccheckpointingwithcheckpointingintervalthatgivesmaximumperformace,basedonsimulationsofapplicationexecutionusingourfailuresimulator.Nodefailure-recoverytraceofthesystemconsidered,applicationscalabilitydataandaccuracymetricsofthefailurepredictoraregivenasinputtothesimulator.Forfaircomparison,wehaveextendedFT-Pro[12]toconsiderthescalabilityofapplicationstodecidewhatnumberofnodesoutoftheavailablenodesshouldbeusedforexecutionforbestperformance.FT-Protakesaprecautionarycheckpointwhenthenumberofconsecutivedecisionsreachesathreshold.Thisisbasedontheassumptionthatdecisionistheonlyonethatdoesnotinvolvecheckpointing.Sinceweconsiderthatdoesnotinvolvecheckpointing,wehavealsomodi“edFT-Prosuchthatittakesaprecautionarycheckpointwheneverthetimesincelastcheckpointreachesathreshold.ThismakestheprecautionarycheckpointingstrategyofFT-ProsimilartothatofAFT-Prorequiresallocationofaconstantnumberofsparenodessothattheapplicationcanbeexecutedontheconstantremainingnumberofnodesinthesystemtillcompletion.Inouranalysis,wehavefoundthattheoptimalsparenodeallocationformaximumperformancecanvarybasedonvariousfactorsincludingnumberofsimultaneousnodefailures,scalabilitycurveofthe

applicationetc.Forthepurposeofourevaluat
applicationetc.Forthepurposeofourevaluation,weallocatethenumberofsparenodesequaltotheaverageofthenumberoffailednodes(ornodesthatweredown)atanypointoftimeinthefailurehistorybeforethetimewhenthesimulatedapplicationstartsexecution.Thisistomakesurethatthereareenoughsparenodestoexercisetheoptionofprocessmigrationwhileavoidinghighsparenodeallocationtoreducetheamountofidlinginthesystem.Forourexperiments,weset,whichistheconstantworktobecompletedbetweeneach,astheworkdonebytheapplicationin30minutesinafailurefreeenvironment.Thisisbasedontheresultsinpreviouseortsonfailure Cijo George and Sathish S. Vadhiyar / Procedia Computer Science 9 ( 2012 ) 166 – 175 Table2:AFTvsothermethods(LinearScalability,PnodesAdFTFTProckp512(LANL)481.9623.22156.381024(LANL)642.338.7077.5716384(Synthetic)10038.6215.1687.27Table3:FaultToleranceActionstakenbyAFT(LinearScalability,PnodesMTBFhrsskipckpNmigreschpreckpresch512(LANL)23.9500163541024(LANL)5.310010124172416384(Synthetic)10.8600647711predictors[13]thatreportthebestaccuracymetricsforatimewindowbetween15minutesto1hourdependingonthesystem.MTBFofthesystem,whichisusedforprecautionarycheckpointingistakenastheobservedMTBFfromthetracehistory.Wealsoassumethefollowingcostsforthevariousfaulttoleranceactions:ckp:5minutes,0.33minutes,resch:3minutes,recover:5minutes.Theseareinaccordancewiththevaluesgivenforthe2011costscenarioin[15].RealtracesfromLANL[16]correspondingtosystem20whichisa512-nodesystemandsystem18,whichisa1024-nodesystemareusedforsimulationsofsmallandmediumscalesystems.Forsimulationsofverylargescalesystemsforwhichrealfailuretracesarenotavailable,wegeneratesynthetictracesusingoursimulatorfordinumberofnodesbasedontheobservationin[17]thatthetimestofailureofnodesinasystemfollowsaWeibullprobabilitydistributionandthetimestorecoverfollowaLog-normalprobabilitydistribution.Simulationsaredoneforaperiodof30daysandevaluationisbasedontheworkdonebytheapplicationinunittime.FortheLANLtraces,arandomyearischosenfromthetraceofasystemforsimulation.Applicationexecutionissimulatedforthelastmonthoftheoneyeartrace.ObservedMTBFofthesystemandthenumberofsparenodestobeallottedforFT-Proisobtainedusingthetracehistoryofthepreviouselevenmonths.Asimilarstrategyisalsoadoptedforsynthetictraces,whereatraceisgeneratedforoneyearandsimulationsareperformedforthelastmonthoftheyear.PerformanceonSmallandMediumScaleSystemsSimulationson512and1024nodesaredoneusingrealtracesfromLANL.Asynthetictraceisgeneratedfor16384nodeswithanMTBFofapproximately10hours.Weassumeasyntheticapplicationwithlinearandperfectscalability,suchthattheworkdoneperunittimebytheapplicationonnodesinafailurefreeenvironmentisunits.Precision,andrecall,ofthepredictorareassumedtobe0.7.Table2showstheworkdonepersecondbyAFT(AdFT)andthepercentagegainoverFT-Proandperiodiccheckpointing(%FTProand%ckp,respectively).TheresultshowsthatAFTgives8-23%improvementoverFT-Proandmorethan87%improvementoverperiodiccheckpointing.Table3sho

wsthenumberofvariousfaulttoleranceaction
wsthenumberofvariousfaulttoleranceactionstakenbyAFTduringtheapplicationexecutionperiod.Inthetable,reschreschpreckpcorrespondtoproactiverescheduling,reactivereschedulingandprecautionarycheckpointingrespectively.We“ndthatmostofthefaulttoleranceactionsaremigrationssinceunlikeotheractions,live-migrationdoesnotinvolvecheckpointing,andincursmuchlessercost(0.33minutesinourexperiments)thantheothers.Asigni“cantpercentageoffaulttoleranceactionsarerelatedtorescheduling(reschresch).ItisalsoobservedthatthenumberofreschedulingdecisionsincreasesasMTBFdecreases,whichisduetotheincreasednumberofproactivereschedul-ingtoavoidfailures.Thenumberofreactivereschedulingdecisionsalsoincreaseduetotheincreaseinthenumberofunanticipatedfailures.TheresultsshowthatreschedulingplaysanimportantroleinAFTinadaptingtoalargescalefailureenvironmentwithhighfailuredynamicsorlowMTBFs.6.2.SparesvsFailuresWealsofoundthatreschedulingalsocontributestoincreaseinapplicationperformanceinanindirectway.Ithelpsinadaptivelymaintainingsucientnumberofsparenodesmostofthetimeduringapplicationexecution.Figure2 Cijo George and Sathish S. Vadhiyar / Procedia Computer Science 9 ( 2012 ) 166 – 175 Figure2:forAFigure3:AnalysisofnumberofidlenodesforAFTduringtheentiresimulationperiod(LANL-1024)PRAdFTFTProckp1.01.01017.820.31181.3760.81.01017.550.32181.3010.61.01016.620.35181.0440.41.01015.690.31180.7871.00.8664.040.5683.57340.80.8673.176.1986.09740.60.8595.3517.3164.58410.40.8586.654.9462.179PRAdFTFTProckp1.00.6639.005.2076.65110.80.6648.263.2479.2110.60.6652.804.2788.51080.40.6651.556.6080.12051.00.4701.711.2196.36750.80.4680.600.1188.15140.60.4681.912.0688.51350.40.4633.511.5677.8647Table4:VaryingPrecisionandRecallforAshowsthevariationsinthenumberofsparenodesandpredictedfailureswithAFTfor1024nodes(LANL)ateachwherefailuresarepredicted.Itcanbeseenthatinmajorityofthecases,thenumberofsparenodesinthesystemisgreaterthanorequaltothenumberofpredictedfailures.Thishelpsincreasethenumberoflow-costmigrationdecisionstoavoidfailures,henceimprovingperformancesigni“cantly.6.3.ResourceUtilizationReschedulingalsohelpsinutilizingsparenodesinthesystemthatbecomeavailableafterrecoveringfromafailure,forapplicationexecution.Wheneverproactiveorreactivereschedulingisperformed,AFTtriestoaccommodatethehealthysparenodesinthesystem.Figure3showsthepercentagesoftimeofapplicationexecutionobservedfordierentidlenodenumbersfor1024nodes(LANL).Wecanobservethatforupto40%ofthetimetherearenoidlenodesandforabout94%ofthetimethenumberofidlenodesislessthanorequalto2.AsimilaranalysisonFT-Proshowedthat99.99%ofthetime,thenumberofidlenodesinthesystemis3,whichistheallottednumberofsparenodes.ThisshowstheeectivenessofFTindynamicallyadaptingtofailureswhilekeepingthenumberofidlenodestoaminimum.6.4.AccuracyofFailurePredictionsTable4showstheperformanceofAFTfordierentprecision()andrecall()valuesofthepredictorfor1024nodes(LANL).Itshow

sthatAFToutperformsperiodiccheckpointing
sthatAFToutperformsperiodiccheckpointingbyahugemarginandalsooutperformsFT-Proforthegivenvalues.However,wehaveobservedthatFT-Properformsbetterbyasmallmarginvaluesbelow0.2.Thisisduetothelargenumberofunforeseenfailures,whichresultsinlargenumberofreactiverescheduling,incurringhugecost,resultinginFT-Prothatdoesnotperformreschedulinggivingslightlybetterperformance.ItisobservedthatAFTgivesthemostimprovementoverFT-Provaluesbetween0.6and0.9.Sincefailurepredictorstodayhaveanvalueofmorethan0.6,itcanbeconcludedthatAFTperformsbetterFT-Proforallpracticalpurposes. Cijo George and Sathish S. Vadhiyar / Procedia Computer Science 9 ( 2012 ) 166 – 175 (a)NAMD(b)CCSM(c)WRF(d)ChaNGaFigure4:ScalabilityofdierentApplicationsTable5:AFTvsothermethodsfordierentApplicationsApplicationAdFTFTProckpNAMD286.3215.1587.28CCSM0.3415.1386.37WRF48.4215.2687.47ChaNGa0.461-0.2121.22Table6:AFTvsothermethodsforaPetascalesystemnodesAdFTFTProckp217802.9611.21145.79Table7:AFTvsothermethodsforanExascalesystemnodesAdFTFTProckp2233065.1112.5216.5.RealApplicationsSimulationswerealsodoneusingscalabilitydataoffourreallifeapplicationsasobservedinreallargescalesystems.Thesimulationscorrespondtoapplicationexecutionon16384nodesusingasyntheticfailuretrace.TheapplicationscalabilitycurvesusedarethatofNAMD[18],CCSM[19],WRF[20]andChaNGa[21].Figure4showsthescalabilitycurvesoftheapplicationsasobservedinBlueGeneLandtheircorrespondingunitsofworkdone.AsshowninTable5,AFTperformsmuchbetterthanperiodiccheckpointingforalltheapplicationsandgivesabout15%betterperformancethanFT-ProforNAMD,CCSMandWRF.But,FT-Proshowsslightlybetterper-formanceincaseofChaNGa.Thiscanbeattributedtothemodi“cationdonetoFT-Prothatallowsittostarttheapplicationonthenumberofnodeswhichgivesmaximumperformance.ItcanbeobservedfromthescalabilitycurveofChaNGainFigure4thattheapplicationperformancedecreasesafter8192nodes.Hence,themodi“edFT-Protechniquewillstarttheapplicationonlyon8192nodes,leavingalargenumberofsparenodes,allowingittoperformlow-costlivemigrationsatalls,resultinginperformanceimprovementoverAFTbyasmallmargin.6.6.PetascaleandExascaleSystemsSimulationofNAMDisalsodoneforahypotheticalpetascalesystemwith2nodesandahypotheticalexascalesystemwith2nodes.Forthepetascalesystem,eachnodeisassumedtohaveapproximately7.6GFlopsspeakperformance.Fortheexascalesystem,weassumethateachnodeisquad-core,sothatthetotalnumberofprocessorsis2.Eachprocessorisassumedtohaveapeakperformanceofapproximately29.8GFlopss,whichisapproximatedassumingthattheapproximateratiooftheaverageprocessorpeakperformanceofanexascalesystemtothatofapetascalesystemwillbeapproximatelyequaltoasimilarratiobetweenapetascalesystemandaterascalesystem.ThescalabilitycurveforNAMDwasobtainedfromastudyonBlueGeneLsystem[18]witheachnodehavingapproximately2.7GFlopsspeakperformance.Approximatescalabilitycurveforthepetascalesystemwasgeneratedbasedontheassumptionthattheworkdonebyanodeinthehypotheticalpetascalesystemwillbeappro

ximatelyequaltotheworkdoneby3nodesofBlue
ximatelyequaltotheworkdoneby3nodesofBlueGeneL.Scalabilitycurvefortheexascalesystemisgeneratedinasimilarway.WegeneratedsynthetictraceswithMTBFofapproximately4hoursforpetascalesystemasreportedin[22,6]and35minutesforexascalesystem,asreportedin[6].ResultsinTables6and7showthatAFToutperformsperiodiccheckpointingbyabout145%andabout21%forpetascaleandexascalesystems,respectively.ItalsooutperformsFT-Probyabout11%andabout12.5%forpetascaleandexascalesystems,respectively.Itisobservedthatinexascalesystem,AFTperformssigni“cantlymorenumberofmigrationsthanFT-Pro(406against300),whichagainshowsitseectivenessindynamicallymaintainingenoughsparenodestomaximizelow-costmigrations.Also,thenumberofreschedulingincreasesdrasticallyfrom51to364whenmovingfrompetascaletoexascale,showingthatapplicationmalleabilityplaysanimportantroleinthe Cijo George and Sathish S. Vadhiyar / Procedia Computer Science 9 ( 2012 ) 166 – 175 performanceofAFTandthatwithincreasingsizeofthesystemsanddecreasingMTBF,applicationmalleabilityandreschedulingcanplayaveryimportantroleindevelopingbetterfaulttolerancestrategies.7.ConclusionsandFutureWorkInthiswork,wehavedevelopedAFT,anadaptiveframeworkthatmakesruntimedecisionsonfaulttolerancetechniquesatdierentpointsofapplicationexecution.Ourframeworkconsidersapplicationmalleabilityandexploitsthebene“tsofreschedulingforfaulttoleranceinsuchapplications.Evaluationsbasedonsimulationsshowedthatourstrategyinvolvingmalleabilityoutperformsthepopularbutstaticperiodiccheckpointingapproachbyatleast21%,andalsoyieldsupto23%higheramountofworkthanthedynamicFT-Prostrategythatdoesnotinvolvemalleability.Ourresultsalsoshowthatouradaptivestrategyyieldshighperformanceevenforpetascalesystemsandbeyond.Wealsoshowedthatapplicationmalleabilitywillhavetobeconsideredstronglyforfutureexascalesystems.Infuture,weplantodevelopafaultmanagementsoftwaresuitethatwillconsistofthefaultmanagementframe-workdiscussedinthispaper,toolsforperformingvariousfaulttoleranceactions,andtechniquesthatgivefailurepredictions.Wealsoplantoenhanceourfailuresimulatortostudyalternatefaulttoleranceoptionsforfuturesystems.References[1]Top500SupercomputingSites,[2]L.Oliker,A.Canning,J.Carter,C.lancu,M.Lijewski,S.Kamil,J.Shalf,H.Shan,E.Strohmaier,S.Ethier,T.Goodale,Scienti“cApplicationPerformanceonCandidatePetaScalePlatforms,in:IPDPS07:Proceedingsofthe21stIEEEInternationalParallelandDistributedProcessingSymposium,2007,pp.1…12.[3]A.Bhatele,P.Jetley,H.Gahvari,L.Wesolowski,W.D.Gropp,L.V.Kale,ArchitecturalConstraintstoAttain1Exa”opsforThreeScienti“cApplicationClasses,in:IPDPS11:Proceedingsofthe25thIEEEInternationalParallelandDistributedProcessingSymposium,2011.[4]F.Petrini,K.Davis,J.Sancho,System-LevelFault-ToleranceinLarge-ScaleParallelMachineswithBueredCoscheduling,in:IPDPS04:Proceedingsofthe21stIEEEInternationalParallelandDistributedProcessingSymposium,2004,pp.209….[5]N.R.Adiga,etal.,AnOverviewoftheBlueGeneLSupercomputer,in:Supercomputing02:Proceedingsofthe2002ACMIEEEconference

onSupercomputing,2002.[6]P.Kogge,K.Bergm
onSupercomputing,2002.[6]P.Kogge,K.Bergman,S.Borkar,D.Campbell,W.Carlson,W.Dally,M.Denneau,P.Franzon,W.Harrod,K.Hill,J.Hiller,S.Karp,S.Keckler,D.Klein,R.Lucas,M.Richards,A.Scarpelli,S.Scott,A.Snavely,T.Sterling,R.S.Williams,K.Yelick,ExascaleComputingStudy:TechnologyChallengesinAchievingExascaleSystems,(P.Kogge,EditorandStudyLead)(2008).[7]J.S.Plank,AnOverviewofCheckpointinginUniprocessorandDistributedSystems,FocusingonImplementationandPerformance,technicalReport,UniversityofTennesseeKnoxville,TN,USA(1997).[8]J.Ansel,K.Arya,G.Cooperman,DMTCP:TransparentCheckpointingforClusterComputationsandtheDesktop,in:IPDPS09:Proceed-ingsofthe23rdIEEEInternationalParallelandDistributedProcessingSymposium,2009.[9]C.Wang,F.Mueller,C.Engelmann,S.L.Scott,Proactiveprocess-levellivemigrationinHPCenvironments,in:SC08:Proceedingsofthe2008ACMIEEEconferenceonSupercomputing,2008.[10]S.Vadhiyar,J.Dongarra,SRS-AFrameworkforDevelopingMalleableandMigratableParallelApplicationsforDistributedSystems,ParallelProcessingLetters13(2)(2003)291…312.[11]G.Zheng,L.Shi,L.V.Kale,FTC-Charm:anin-memorycheckpoint-basedfaulttolerantruntimeforCharmandMPI,in:CLUSTER04:Proceedingsofthe2004IEEEInternationalConferenceonClusterComputing,2004,pp.93…103.[12]Z.Lan,Y.Li,AdaptiveFaultManagementofParallelApplicationsforHigh-PerformanceComputing,IEEETransactionsonComputers57(12).[13]P.Gujrati,Y.Li,Z.Lan,R.Thakur,J.White,AMeta-LearningFailurePredictorforBlueGeneLSystems,in:ICPP07:Proceedingsofthe2007InternationalConferenceonParallelProcessing,2007.[14]N.Nakka,A.Agrawal,A.Choudhary,PredictingNodeFailureinHighPerformanceComputingSystemsfromFailureandUsageLogs,in:IPDPS11:Proceedingsofthe25thIEEEInternationalParallelandDistributedProcessingSymposium,2011.[15]F.Cappello,H.Casanova,Y.Robert,Checkpointingvs.MigrationforPost-PetascaleSupercomputers,in:ICPP10Proceedingsofthe201039thInternationalConferenceonParallelProcessing,2010.[16]FailureTraceArchive,[17]B.Schroeder,G.Gibson,ALarge-scaleStudyofFailuresinHigh-PerformanceComputingSystems,in:ProceedingsoftheInternationalConferenceonDependableSystemsandNetworks(DSN2006),2006.[18]A.Bhatele,S.Kumar,C.Mei,J.C.Phillips,G.Zheng,L.V.Kale,OvercomingScalingChallengesinBiomolecularSimulationsacrossMultiplePlatforms,in:IPDPS08:Proceedingsofthe22rdIEEEInternationalParallelandDistributedProcessingSymposium,2008.[19]J.M.Dennis,R.Jacob,M.Vertenstein,T.Craig,R.Loy,TowardanUltra-HighResolutionCommunityClimateSystemModelfortheBluegenePlatform,JournalofPhysics:ConferenceSeries78.[20]J.Michalakes,J.Hacker,R.Loft,M.O.McCracken,A.Snavely,N.J.Wright,T.Spelce,R.Walkup,B.Gorda,WRFNatureRun,in:Supercomputing07:Proceedingsofthe2007ACMIEEEconferenceonSupercomputing,2007.[21]F.Gioachin,P.Jetley,C.L.Mendes,L.V.Kale,T.Quinn,TowardsPetascaleCosmologicalSimulationswithChaNGa,technicalReport,ParallelProgrammingLaboratory,UniversityofIllinois(2007).[22]F.Cappello,Resilience:OneofthemainchallengesforExascaleComputing,INRIAIllinoisJoint-LaboratoryonPetascalecom