sureindependenceofconcurrentcomputationsincontrasttoconventionalparallelprogrammming22ExecutingProgramsToexecuteaprogramonprocessingcorestheruntimeraisesthegranularityofcomputationstofunctionsItse ID: 157224
Download Pdf The PPT/PDF document "Efcient,Precise-RestartableProgramExecu..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Efcient,Precise-RestartableProgramExecutiononFutureMulticoresGaganGupta,SrinathSridharan,andGurindarS.SohiDepartmentofComputerSciences,UniversityofWisconsin-Madisongagang,sridhara,sohi1.INTRODUCTIONMulticoreprocessorsarebecomingubiquitous,placingnewdemandsonhardwareandsoftwaredesigners.Nolongerdoasmallsetofexpertsdevelopafewsoftwareapplicationsforasmallnumberofparallelmachines.Alreadystandardinservers,desktopsandlaptops,todayhandhelddevicesusemulticores,expandingthespectrumoftheirusefrommobilecomputingatthelowendtocloudcomputingatthehighend.Consequently,dramaticallyincreasednumberofsoftwaredevelopersarecreatinghundredsofthousandsofapplicationstorunonaplethoraofdiverseplatforms.Thuseaseofwritingparallelprograms,toachieveenergyand/orperformanceefciency,continuestogainimportance.Atthesametime,programmershavetoaccountforthechangingcharacteristicsofemergingtechnologies.Proces-sorsaretransitioningfromhomogeneouscorestohetero-geneouscoreswithdisparateperformance/energycharacter-istics.Asfuturecomputinghardwarepushesthelimitsofsemiconductortechnology,itwillbecomeincreasinglyunre-liable.Simultaneously,emerginguseofcomputingsystemswillrequirethemtohostmultipleapplicationsconcurrently,evenonmobiledevices.Unreliability,resource(computingandenergy)management,andservice-levelagreementswillleadtoimpreciseknowledgeofavailableresourcesduringaprogram'sexecution.Henceprogrammerscannolongeras-sumeavailabilityofgiven(orconstant)resourcestoprocessanapplication,unlikeincanonicalparallelprogramming.Theconuenceoftheabovefactorsposedauntingchal-lengestoprogrammersinwritingubiquitousprogramsandachievingtheirreliable,energy-efcient,parallelexecution,whileremainingagnosticoftheunpredictable,dynamically(andpotentiallycontinuously)changingcomputingcondi-Weproposeamodelthatseamlesslyaddressesthisrangeofchallenges.Itreliesonexpressingparallelalgorithmsassequentialprograms,i.e.,andperformingtheircontrolled,dynamicparallelexecutionwhilehonoringtheirsequentialsemantics.Althoughatrstglancetheapproachmayappearantitheticaltoparallelism,weshowthatitaffordsseveraladvantages.Itsintuitiveinter-faceandsequentiallydeterminateexecution(whichensuresthatinanyexecutionofaprogramwiththesameinputs,avariableisassignedthesamesequenceofvalues)allowprogrammerstoeasilyreasonabouttheprogramexecution,simplifyingprogramming.Themodelutilizestheimpliedorderinastatically-sequentialprogramtoachieveadataowscheduleofparallelexecution(§2.2),potentiallyexploitingallavailableparallelism.Further,theorderpermitstheadapt-abilityneededtoachieveefcientexecutionindynamicallychanging(§2.3),unreliable(§2.4)computingenvironments.Weprovideanoverviewoftheseaspectsandpresentresultsfromoureffortstodevelopseveralbenchmarkapplicationsusingthemodel,implementedasafullyfunctionalruntimesystem,onstockmulticoresystems.2.DYNAMICPARALLELIZATIONOFSEQUENTIALPROGRAMSOurapproachstrivestominimizetheburdenonprogram-mers.Itallowsprogramstobeauthoredinestablishedim-perativeprogramminglanguages,suchasC++,andauto-matestheirparallelexecution.Themodelextractsapro-gram'scomputations,establishesthedynamicdata-owbe-tweenthem,andschedulestheirorderedexecutionastheprevailingresourcespermit.Itcanalsorollbacktheexe-cution,uptoadesiredpoint,andresumeit,ifdesired.Wehighightthemodel'sprinciplesbydescribingtheprogram-minginterfaceandthemechanismsasimplementedintheruntime(aC++library).2.1ComposingProgramsProgrammerstodayfollowmodernsoftwareengineeringandobject-oriented(OO)designprinciplesbycomposingpro-gramsfromreusablefunctionsthatmanipulateencapsulateddataandcommunicatewitheachotherusingwell-denedin-terfaces.Oftensuchwell-composedfunctionsavoidside-effectsbyonlymanipulatingdatacommunicatedthroughtheinterfaces.WeseektoexploitthepropertiesofsuchOOpro-gramsandthenaturalinsightsprogrammershaveintheiral-Programswrittenusingtheruntimelibrarycloselyresem-bletheirsequentialversionsintendedtorunonaunipro-cessor,butforfewuser-annotations.Userscomposepro-gramsfromcomputationsanddatastructuresamenabletoconcurrentexecution,astheywouldconventionalparallelprograms.Inaddition,theyannotatethecodetoidentifyconcurrentfunctionsandthedatapotentiallysharedbetweenthem.Theyfurtherformulatetheshareddatareadandwritten(intheformofobjects)bythefunctions,avail-ablefromthefunction'sinterface,intoreadandwritesets,respectively.Beyondtheseannotationstheonusisnotontheusertoscheduleexecutionofthecomputationsortoen- sureindependenceofconcurrentcomputations,incontrasttoconventionalparallelprogrammming.2.2ExecutingProgramsToexecuteaprogramonprocessingcorestheruntimeraisesthegranularityofcomputationstofunctions.Itsequencesthroughtheprogramsequentiallybutseekstoexecutethefunctionsconcurrently.Beforeexecutingafunctiontherun-timeestablishesitsdependenceonalreadyexecutingfunc-tionsusingtheobjectsinthefunction'sreadandwritesets.Sinceobjectsinthereadandwritesetsmaybeunknownstat-ically,theiridentityisestablisheddynamically,atrun-time,bydereferencingpointers.Theruntimeemploysdataowexecutionsinceitnaturallyexposestheinnateparallelismbetweencomputations.Functionsfoundtobeindependentaresubmittedforexecutionwhilethosethataredependentareshelveduntiltheirdependenceshaveresolved.Theruntimecontinuestoseekworkbeyondstalledcomputa-tions,resourcespermitting,andthusdynamicallyexploitsanyavailableparallelism.Moreover,itensuresthattheex-ecutionproceedsaspertheimpliedsemanticsthatprogram-mershavecometoexpectfromsequentialprograms.Theruntimealsoprovisionstohandlefunctions(identi-edbytheuser)whichdonotfollowOOprinciples(e.g.,withunknownsideeffects)byexecutingthemsequentially.Statically-sequentialapplications(blackscholes,barneshut,bzip2,dedup,histogram,andreverseindex)fromstandardbenchmarksuites,developedusingtheruntimeonthreestockmulticoresystems,an8-threadIntelNehalem-basedmachine,a16-coreanda32-coreAMDOpteron-basedma-chines,achievedspeedups(harmonicmean)similartotheirPthreadversionsontheNehalemmachineandover20%betterontheAMDOpteronmachines[1].2.3Time-andEnergy-EfcientExecutionUtilizingresourcesefcientlyindynamicallychangingen-vironmentswillbeakeychallengegoingforward.Doingsowillrequireexposingapplicationparallelismthatbesttsthecapabilitiesofresourcesintheexecutionenviron-ment.Whileexposingtoolittleparallelismcanunderuti-lizetheresources,exposingexcessiveparallelismcanleadtocontentionforresources,potentiallydegradingitstime-andenergy-efciency.Dynamicallymatchingtheexposedparallelismwiththechangingcapabilitiesoftheexecutionenvironmentrequirestheabilitytosuspendalreadyexecut-ingcomputations,reintroducethemlater,andintroducenewcomputationsintotheenvironment,asappropriate.Therun-timeexploitstheimpliedorderinginstatically-sequentialprogramstochoosecomputationsjudiciouslywhenregulat-ingtheparallelism,whileensuringforwardprogress.ItusesGoodnessofParallelism(GoP)metric,computedperiodi-callyastheexecutionunfolds,tocorrelatetheinstantaneousefciencyoftheprogramtotheinstantaneousdegreeofpar-allelism.Adropinefciencycausesittothrottlethepar-allelismtoeasecontention,whileanimprovementinef-ciencycausesittoincreasetheparallelismtoexploitavail-ableresources.Experimentalresultsonastock4-core(8-thread)IntelCorei72600(SandyBridge)workstationshowthatourapproachachievesupto50%highertime-andenergy-efciencyoverthestate-of-the-artparallelexecutionsystemsacrossavarietyofdynamicoperatingconditions.2.4Precise-RestartableExecutionFuturecomputersystemswillpresentunreliableresourcestoapplicationsduetoexceptionevents,e.g.,hardwarefaults,timingerrorscausedbyaggressiveenergymanagement,orduetoresourcemanagement.Tobeefcientitwillstillbedesirabletocontinueexecutingtheinterruptedprogram,possiblyatadifferenttimeand/oronanothersystem,with-outdiscardingallofthecompletedwork.Hencetoresumeexecutioninsuchscenariostheruntimesupportsprecise-restartabilityofparallelprograms,analogoustoprecise-interruptibleexecutionofsequentialprograms.Theruntimeexploitstheimpliedorderingtopreciselyidentifytheexceptedcomputationinthestatically-sequentialprogramandrestorestheprogramstatetoreectthesequen-tialexecutionoftheprogramuptothecomputation.Todosoittrackstheinvocationandcompletionofcomputationintheimpliedprogramorder.Further,itcheckpointsthestateacomputationmaymodify,i.e.,itsmodset(auser-providedsetsimilartothecomputation'swritesetandprocessedsim-ilarly)beforeitsexecution.Oncetheexceptingconditionismitigatedtheprogrammayresumefromtheexceptingcomputation.Theruntimealsoincrementallycheckpointstheprogramstateaftereachcomputationsuccessfullycom-pletes,usingitsmodset.Thisstatecanbeusedtospatiallyortemporallymigrateahaltedprogram.Experimentsonastock12-core(24-thread)IntelXeonE5-2420(SandyBridge)workstationshowthattherun-timecantoleratesigncantlyhigher(proportionaltothread-count)exceptionsthantheconventionalapproaches.De-pendingontheapplication,thesupporttotolerateaggressiveexceptionrates(e.g.,upto2everysecond)incursperfor-manceoverheadsrangingfrom0%to135%(at0faults).3.CONCLUSIONParallelprogrammingformulticore-basedsystemsandtheirdynamicallychangingoperatingenvironmentsposesignif-icantchallengestoeverydayprogrammersintheefforttoimproveproductivityandtoachieveerror-free,efcientexecutionoftheirprograms.Wepresentedamodelthatmeetsthesechallengesbetterthanotherapproachesbyusingstatically-sequentialprogramsandperformingtheirdynam-icallycontrolleddataowexecution.References[1]G.GuptaandG.S.Sohi.Dataowexecutionofsequentialimperativeprogramsonmulticorearchitectures.InMICRO-44December2011. \n \r \r\n \r \n\n \r \r\n \n\r\n \n\r\r\r\n \n\r\r \n\n \n \n \r\r\n \r \r\n \r \r\r\n \n \r\r !\r\n\n \n\n"\r \r \n \r\n \n \r\n \n\r\n \r\r\r #\r \n\n \n $\n \n\r\n\r $ $\n \r\r\r\n\r\n\n\n\r% %\r \n\r\n %\n& \n \n\r\n\n\n\r \r \n\r\r\n\r \r\r\r \r \r \n\r\r \r \r\n\n\r\r!\r \r\r"#\r#\n \n# $%&'\r \r\r##\n\r \n \n\r\n \n \n '( )\r\n *\n \n\r\n (\r\n\r +\n \n\r, \r \r\n$\n )\r\n \r\n \r \n-\r .\n\r \r. \r\n \n +// , \r\n$\n\r \r\n \n\n \n\n\n\r \n 0$ )*)\r \n\n '\n\r \r\n\n\r \n+\r\r# \n \r\n \n\r1 \n\r\n \r\n \r \r. \r \r\n .\n\r \r1 \r \n \r. \r\r\n \r\r \r\n-\r \r\n \r \r \r \r\n \n\n +\n ,* \r2 \r\n \r \r .\r\r3( 45 6 4"67( 4*6 4"68( 4" 6 469( 456 4*6:( 456 4*6;( 4%6 46 \n\n\r \r\n =, \r \r\n \r \n\r\r \n\n' \n\n \r \r \r\r\n!\r\n\n \r\n \n\n *\n -$\r\n * % \n \r\r \r\r\n)\r\n (\r\r\r \r *\r. \r\n 8 \n\r\r\r\n ! \r\n \r\r\n \r2 \n -\r \r\n =\r\n \n \r\r\n \r\r 2\n\r\n .\r/\n \r \r\n \n \n\r \n\r-\r \r\n )\r\n\r\n \r \r +\r. ,*\r \r. \r\n \r\n \n+\n \r \r, *\n \r\n \r (\n\r \r \n\r \r \r\n 2? \r . \r\n\r \r \r\n\n \n\n \r\r\r \r \r\r\n \n \n\n\n \n\r\n 5\n\r ! *\r. \r\n $0&'#\r# \n#\r\r\r"\r\r\r#\r#\n1\r# \n2#\r# 3\n&'45%'# 1&\n *\n "\r\n$ \r\n( ; * \r -$ \n\n \n \n %! \n\r \r!"#\r$% & '( !)!)"!)#!)$&%%'\r$\n \r%()*)+,*- .+\r/( $\r/0\r 1/+1""\r\r+ * +,- * \r &\n . #!%#!#!/#!"#!0#!!)!0!)#/!)1!!)1%)!!%)!2%)%%%)%"0)0##)#1#)$!//)//0)#/##)#11\r0\r 2%3425%30\r/1\r%2.\r3 \n3"4 \n3!4 \n3%#4 \n3%4 \n3$4 \n3"4 \n34 \n3%4 +3"4 +3!4 +3%#4 +3%4 +3$4 +3"4 +34 +3%4 +5'(3 45 \n '( '(