/
Efcient,Precise-RestartableProgramExecutiononFutureMulticoresGaganGup Efcient,Precise-RestartableProgramExecutiononFutureMulticoresGaganGup

Efcient,Precise-RestartableProgramExecutiononFutureMulticoresGaganGup - PDF document

jane-oiler
jane-oiler . @jane-oiler
Follow
372 views
Uploaded On 2015-10-11

Efcient,Precise-RestartableProgramExecutiononFutureMulticoresGaganGup - PPT Presentation

sureindependenceofconcurrentcomputationsincontrasttoconventionalparallelprogrammming22ExecutingProgramsToexecuteaprogramonprocessingcorestheruntimeraisesthegranularityofcomputationstofunctionsItse ID: 157224

sureindependenceofconcurrentcomputations incontrasttoconventionalparallelprogrammming.2.2ExecutingProgramsToexecuteaprogramonprocessingcorestheruntimeraisesthegranularityofcomputationstofunctions.Itse

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Efcient,Precise-RestartableProgramExecu..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Efcient,Precise-RestartableProgramExecutiononFutureMulticoresGaganGupta,SrinathSridharan,andGurindarS.SohiDepartmentofComputerSciences,UniversityofWisconsin-Madisongagang,sridhara,sohi1.INTRODUCTIONMulticoreprocessorsarebecomingubiquitous,placingnewdemandsonhardwareandsoftwaredesigners.Nolongerdoasmallsetofexpertsdevelopafewsoftwareapplicationsforasmallnumberofparallelmachines.Alreadystandardinservers,desktopsandlaptops,todayhandhelddevicesusemulticores,expandingthespectrumoftheirusefrommobilecomputingatthelowendtocloudcomputingatthehighend.Consequently,dramaticallyincreasednumberofsoftwaredevelopersarecreatinghundredsofthousandsofapplicationstorunonaplethoraofdiverseplatforms.Thuseaseofwritingparallelprograms,toachieveenergyand/orperformanceefciency,continuestogainimportance.Atthesametime,programmershavetoaccountforthechangingcharacteristicsofemergingtechnologies.Proces-sorsaretransitioningfromhomogeneouscorestohetero-geneouscoreswithdisparateperformance/energycharacter-istics.Asfuturecomputinghardwarepushesthelimitsofsemiconductortechnology,itwillbecomeincreasinglyunre-liable.Simultaneously,emerginguseofcomputingsystemswillrequirethemtohostmultipleapplicationsconcurrently,evenonmobiledevices.Unreliability,resource(computingandenergy)management,andservice-levelagreementswillleadtoimpreciseknowledgeofavailableresourcesduringaprogram'sexecution.Henceprogrammerscannolongeras-sumeavailabilityofgiven(orconstant)resourcestoprocessanapplication,unlikeincanonicalparallelprogramming.Theconuenceoftheabovefactorsposedauntingchal-lengestoprogrammersinwritingubiquitousprogramsandachievingtheirreliable,energy-efcient,parallelexecution,whileremainingagnosticoftheunpredictable,dynamically(andpotentiallycontinuously)changingcomputingcondi-Weproposeamodelthatseamlesslyaddressesthisrangeofchallenges.Itreliesonexpressingparallelalgorithmsassequentialprograms,i.e.,andperformingtheircontrolled,dynamicparallelexecutionwhilehonoringtheirsequentialsemantics.Althoughatrstglancetheapproachmayappearantitheticaltoparallelism,weshowthatitaffordsseveraladvantages.Itsintuitiveinter-faceandsequentiallydeterminateexecution(whichensuresthatinanyexecutionofaprogramwiththesameinputs,avariableisassignedthesamesequenceofvalues)allowprogrammerstoeasilyreasonabouttheprogramexecution,simplifyingprogramming.Themodelutilizestheimpliedorderinastatically-sequentialprogramtoachieveadataowscheduleofparallelexecution(§2.2),potentiallyexploitingallavailableparallelism.Further,theorderpermitstheadapt-abilityneededtoachieveefcientexecutionindynamicallychanging(§2.3),unreliable(§2.4)computingenvironments.Weprovideanoverviewoftheseaspectsandpresentresultsfromoureffortstodevelopseveralbenchmarkapplicationsusingthemodel,implementedasafullyfunctionalruntimesystem,onstockmulticoresystems.2.DYNAMICPARALLELIZATIONOFSEQUENTIALPROGRAMSOurapproachstrivestominimizetheburdenonprogram-mers.Itallowsprogramstobeauthoredinestablishedim-perativeprogramminglanguages,suchasC++,andauto-matestheirparallelexecution.Themodelextractsapro-gram'scomputations,establishesthedynamicdata-owbe-tweenthem,andschedulestheirorderedexecutionastheprevailingresourcespermit.Itcanalsorollbacktheexe-cution,uptoadesiredpoint,andresumeit,ifdesired.Wehighightthemodel'sprinciplesbydescribingtheprogram-minginterfaceandthemechanismsasimplementedintheruntime(aC++library).2.1ComposingProgramsProgrammerstodayfollowmodernsoftwareengineeringandobject-oriented(OO)designprinciplesbycomposingpro-gramsfromreusablefunctionsthatmanipulateencapsulateddataandcommunicatewitheachotherusingwell-denedin-terfaces.Oftensuch“well-composed”functionsavoidside-effectsbyonlymanipulatingdatacommunicatedthroughtheinterfaces.WeseektoexploitthepropertiesofsuchOOpro-gramsandthenaturalinsightsprogrammershaveintheiral-Programswrittenusingtheruntimelibrarycloselyresem-bletheirsequentialversionsintendedtorunonaunipro-cessor,butforfewuser-annotations.Userscomposepro-gramsfromcomputationsanddatastructuresamenabletoconcurrentexecution,astheywouldconventionalparallelprograms.Inaddition,theyannotatethecodetoidentifyconcurrentfunctionsandthedatapotentiallysharedbetweenthem.Theyfurtherformulatetheshareddatareadandwritten(intheformofobjects)bythefunctions,avail-ablefromthefunction'sinterface,intoreadandwritesets,respectively.Beyondtheseannotationstheonusisnotontheusertoscheduleexecutionofthecomputationsortoen- sureindependenceofconcurrentcomputations,incontrasttoconventionalparallelprogrammming.2.2ExecutingProgramsToexecuteaprogramonprocessingcorestheruntimeraisesthegranularityofcomputationstofunctions.Itsequencesthroughtheprogramsequentiallybutseekstoexecutethefunctionsconcurrently.Beforeexecutingafunctiontherun-timeestablishesitsdependenceonalreadyexecutingfunc-tionsusingtheobjectsinthefunction'sreadandwritesets.Sinceobjectsinthereadandwritesetsmaybeunknownstat-ically,theiridentityisestablisheddynamically,atrun-time,bydereferencingpointers.Theruntimeemploysdataowexecutionsinceitnaturallyexposestheinnateparallelismbetweencomputations.Functionsfoundtobeindependentaresubmittedforexecutionwhilethosethataredependentare“shelved”untiltheirdependenceshaveresolved.Theruntimecontinuestoseekworkbeyondstalledcomputa-tions,resourcespermitting,andthusdynamicallyexploitsanyavailableparallelism.Moreover,itensuresthattheex-ecutionproceedsaspertheimpliedsemanticsthatprogram-mershavecometoexpectfromsequentialprograms.Theruntimealsoprovisionstohandlefunctions(identi-edbytheuser)whichdonotfollowOOprinciples(e.g.,withunknownsideeffects)byexecutingthemsequentially.Statically-sequentialapplications(blackscholes,barneshut,bzip2,dedup,histogram,andreverseindex)fromstandardbenchmarksuites,developedusingtheruntimeonthreestockmulticoresystems,an8-threadIntelNehalem-basedmachine,a16-coreanda32-coreAMDOpteron-basedma-chines,achievedspeedups(harmonicmean)similartotheirPthreadversionsontheNehalemmachineandover20%betterontheAMDOpteronmachines[1].2.3Time-andEnergy-EfcientExecutionUtilizingresourcesefcientlyindynamicallychangingen-vironmentswillbeakeychallengegoingforward.Doingsowillrequireexposingapplicationparallelismthatbesttsthecapabilitiesofresourcesintheexecutionenviron-ment.Whileexposingtoolittleparallelismcanunderuti-lizetheresources,exposingexcessiveparallelismcanleadtocontentionforresources,potentiallydegradingitstime-andenergy-efciency.Dynamicallymatchingtheexposedparallelismwiththechangingcapabilitiesoftheexecutionenvironmentrequirestheabilitytosuspendalreadyexecut-ingcomputations,reintroducethemlater,andintroducenewcomputationsintotheenvironment,asappropriate.Therun-timeexploitstheimpliedorderinginstatically-sequentialprogramstochoosecomputationsjudiciouslywhenregulat-ingtheparallelism,whileensuringforwardprogress.ItusesGoodnessofParallelism(GoP)metric,computedperiodi-callyastheexecutionunfolds,tocorrelatetheinstantaneousefciencyoftheprogramtotheinstantaneousdegreeofpar-allelism.Adropinefciencycausesittothrottlethepar-allelismtoeasecontention,whileanimprovementinef-ciencycausesittoincreasetheparallelismtoexploitavail-ableresources.Experimentalresultsonastock4-core(8-thread)IntelCorei72600(SandyBridge)workstationshowthatourapproachachievesupto50%highertime-andenergy-efciencyoverthestate-of-the-artparallelexecutionsystemsacrossavarietyofdynamicoperatingconditions.2.4Precise-RestartableExecutionFuturecomputersystemswillpresentunreliableresourcestoapplicationsduetoexceptionevents,e.g.,hardwarefaults,timingerrorscausedbyaggressiveenergymanagement,orduetoresourcemanagement.Tobeefcientitwillstillbedesirabletocontinueexecutingtheinterruptedprogram,possiblyatadifferenttimeand/oronanothersystem,with-outdiscardingallofthecompletedwork.Hencetoresumeexecutioninsuchscenariostheruntimesupportsprecise-restartabilityofparallelprograms,analogoustoprecise-interruptibleexecutionofsequentialprograms.Theruntimeexploitstheimpliedorderingtopreciselyidentifytheexceptedcomputationinthestatically-sequentialprogramandrestorestheprogramstatetoreectthesequen-tialexecutionoftheprogramuptothecomputation.Todosoittrackstheinvocationandcompletionofcomputationintheimpliedprogramorder.Further,itcheckpointsthestateacomputationmaymodify,i.e.,itsmodset(auser-providedsetsimilartothecomputation'swritesetandprocessedsim-ilarly)beforeitsexecution.Oncetheexceptingconditionismitigatedtheprogrammayresumefromtheexceptingcomputation.Theruntimealsoincrementallycheckpointstheprogramstateaftereachcomputationsuccessfullycom-pletes,usingitsmodset.Thisstatecanbeusedtospatiallyortemporallymigrateahaltedprogram.Experimentsonastock12-core(24-thread)IntelXeonE5-2420(SandyBridge)workstationshowthattherun-timecantoleratesigncantlyhigher(proportionaltothread-count)exceptionsthantheconventionalapproaches.De-pendingontheapplication,thesupporttotolerateaggressiveexceptionrates(e.g.,upto2everysecond)incursperfor-manceoverheadsrangingfrom0%to135%(at0faults).3.CONCLUSIONParallelprogrammingformulticore-basedsystemsandtheirdynamicallychangingoperatingenvironmentsposesignif-icantchallengestoeverydayprogrammersintheefforttoimproveproductivityandtoachieveerror-free,efcientexecutionoftheirprograms.Wepresentedamodelthatmeetsthesechallengesbetterthanotherapproachesbyusingstatically-sequentialprogramsandperformingtheirdynam-icallycontrolleddataowexecution.References[1]G.GuptaandG.S.Sohi.Dataowexecutionofsequentialimperativeprogramsonmulticorearchitectures.InMICRO-44December2011.  \n \r   \r\n \r \n\n \r  \r\n       \n \r \n                 \n\r\r \r\n  \n \r\r  \n\n \n  \n  \r\r\n \r   \r\n \r \r\r\n \n \r \r  !\r\n\n  \n\n"\r \r \n  \r\n   \n \r\n \n\r\n \r\r\r  #\r \n\n \n  $\n \n\r\n\r $  $\n \r \r  \r   \n \r\n \n\n \r % % \r \n\r\n  % \n& \n  \n \r\n \n\n \r  \r       \n \r  \r \n \r \r\r\r  \r   \r \n    \r\r  \r    \r\n\n \r\r! \r  \r \r " #\r#\n   \n  #  $ %&' \r \r \r  #   #  \n \r   \n \n\r\n  \n  \n  '( ) \r\n  *\n \n\r\n (  \r \n \r  + \n  \n\r, \r      \r\n$\n ) \r\n \r\n \r     \n-\r .\n\r  \r.  \r\n \n +// ,   \r\n$\n\r   \r\n \n\n  \n\n\n\r \n 0$  )* )\r  \n\n ' \n\r  \r\n\n\r \n+\r\r#  \n \r\n \n\r1 \n\r\n    \r\n \r   \r. \r  \r\n   .\n\r \r1  \r   \n \r.  \r\r\n  \r\r   \r\n-\r  \r\n  \r  \r \r  \r\n  \n\n +\n  ,*  \r2   \r\n  \r  \r .\r\r 3( 45 6 4"6 7( 4*6 4"6 8( 4" 6 4 6 9( 456 4*6 :( 456 4*6 ;( 4%6 46 \n\n\r  \r\n =, \r \r\n \r  \n\r\r  \n\n'  \n\n    \r \r   \r\r\n!\r\n\n  \r\n \n\n        *\n -$\r\n     * %  \n  \r\r   \r\r\n) \r\n  (\r\r\r \r   *\r.  \r\n    8             \n\r\r \r\n !   \r\n  \r\r\n \r2 \n   -\r  \r\n =\r\n� \n   \r\r\n \r\r 2\n\r\n .\r /\n \r  \r\n \n \n\r \n \r-\r   \r\n  ) \r\n \r\n  \r  \r +\r.  ,*\r  \r.  \r\n \r\n \n+ \n \r \r, *\n \r\n  \r  (\n \r  \r  \n\r  \r  \r\n   2?  \r .   \r    \n \r \r \r\n\n \n\n  \r  \r\r   \r  \r\r\n \n  \n\n\n  \n\r\n  5\n\r  !  *\r.  \r\n $ 0&'#\r# \n #  \r\r  \r" \r\r\r  #\r#\n  1 \r  #  \n 2 #  \r # 3 \n&' 45%' # 1 &\n    *\n "\r\n$  \r\n( ; * \r -$ \n\n               \n \n %! \n \r        \r! "#\r$% & '( !)!)"!)#!)$& % %'\r$\n  \r% ()* ) +,*- .+ \r/(  $\r/0\r 1 /+1""\r\r+  * +,- * \r &\n . #!%#!#!/#!"#!0#!!)!0!)#/!)1!!)1%)!!%)!2%)%%%)%"0)0##)#1#)$!//)//0)#/##)#11\r 0\r 2%34 25%30  \r/1\r %2.\r3 \n3"4 \n3!4 \n3%#4 \n3%4 \n3$4 \n3"4 \n34 \n3%4 +3"4 +3!4 +3%#4 +3%4 +3$4 +3"4 +34 +3%4 +5'(3 45 \n '( '(