/
MinimumLockAssignmentAMethodforExploitingConcurrencyamongCriticalSecti MinimumLockAssignmentAMethodforExploitingConcurrencyamongCriticalSecti

MinimumLockAssignmentAMethodforExploitingConcurrencyamongCriticalSecti - PDF document

joyce
joyce . @joyce
Follow
343 views
Uploaded On 2021-10-08

MinimumLockAssignmentAMethodforExploitingConcurrencyamongCriticalSecti - PPT Presentation

TheauthorparticipatedthisworkwhenhewasagraduatestudentintheUniversityofDelawareInthispaperweproposealockassignmenttechniquetosimplifytheenforcementofmutualexclusioninmultithreadedprogramsWeallowthepr ID: 897881

cs3 lock cs1 cs4 lock cs3 cs4 cs1 cs2 cs6 jmj fig mla optimal hlb cs5 nin inpopl x0000

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "MinimumLockAssignmentAMethodforExploitin..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1 MinimumLockAssignment:AMethodforExploiti
MinimumLockAssignment:AMethodforExploitingConcurrencyamongCriticalSectionsYuanZhang1,VugranamC.Sreedhar2,WeirongZhu3??,VivekSarkar4,andGuangR.Gao11UniversityofDelaware,Newark,DE,fzhangy,ggaog@capsl.udel.edu2IBMT.J.WatsonResearchCenter,Hawthorne,NY,vugranam@us.ibm.com3MicrosoftCorporation,Seattle,WA,weirong.zhu@microsoft.com4RiceUniversity,Houston,TX,vsarkar@rice.eduAbstract.Inthispaperweproposealockassignmenttechniquetosim-plifythemutualexclusionenforcementinmultithreadedprograms.Pro-grammersareallowedtoannotatetheregionsofcodethatareexpectedtobemutuallyexclusiveascriticalsections,withoutusingexplicitlocks.Thecompilerthenautomaticallyinfersanassignmentofthemin-imumnumberoflockstocriticalsectionsbysolvingtheMinimumLockAssignment(MLA)problemsoastoenforcemutualexclusionwithoutanylossofconcurrency.WeshowthattheMLAproblemisNP-hard.WehaveproposedaheuristictosolvetheMLAproblem,andtestedtheoptimalityoftheheuristicwiththeIntegerLinearProgramming(ILP)solver.Wehavealsotestedtheeciencyoftheheuristicusingscienti capplications,fromwhichweobtainupto30%performancegainwithrespecttotheprogramsinwhichallcriticalsectionsarecontrolledbyasinglelock.1IntroductionGiventhattheprocessorsincurrentandfuturecomputersystemsarebecom-ingmulti-ormany-corebydefault,itisimportanttoaddresstheperformanceandproductivityissuesinmultithreadedprogramming.Oneofthemajorper-formanceandproductivityissuesinmultithreadedprogrammingarisesfromen-forcingthemutualexclusion(mutexforshort)usinglock/unlockoperations.Programmersexplicitlyassignlockvariablestocontrolmutexregions,andthelockvariablesareacquiredbytheexecutingthreadbeforethemutexregionisexecuted,andarereleasedaftertheexecutionofthemutexregioncompletes.Explicitlymanagingmultiplelocksiserrorpronesinceitiseasyforprogram-merstointroducedataracesandcreatedeadlocks.Alternatively,programmerscanuseasinglelocktocontrolallmutexregionstoavoiddeadlocksanddataraces.However,theyloseconcurrencyamongmutexregionsbyunnecessarilyserializingthem. ??Theauthorparticipatedthisworkwhenhewasagrad

2 uatestudentintheUniversityofDelaware Int
uatestudentintheUniversityofDelaware Inthispaper,weproposealockassignmenttechniquetosimplifytheenforce-mentofmutualexclusioninmultithreadedprograms.Weallowtheprogrammerstoannotateregionsofcodethatareexpectedtobeexecutedmutuallyexclusivelyascriticalsections,withoutmanagingexplicitlocks.Thecompilerthenau-tomaticallyinfersanassignmentofmultiplecompiler-managedlockstocriticalsections(possiblymultiplelocksforonecriticalsection)topreservethemutualexclusionandalsoexploittheconcurrencyamongcriticalsections.Anaivelockassignmentapproachassociatesonelocktoeachsharedmem-orylocation,andthelocksetofacriticalsectionisthesetoflocksassignedtomemorylocationsitaccesses.Thisapproach,however,mayusemorelocksthannecessary,andintroduceexcessiveoverheadonlockacquisitionandre-lease.Tocontrolthelockingoverhead,wewouldusetheminimumnumberoflockswhichisnecessarytopreservethemutualexclusionandfullyexploittheconcurrencybetweencriticalsections.WeformulatethislockassignmenttaskastheMinimumLockAssignment(MLA)problem:Problem1(MinimumLockAssignment).Givenamultithreadedprogramwithasetofcriticalsections, ndtheminimumnumberofdistinctlocksthatareneededforcontrollingthecriticalsectionssuchthat(a)Twocriticalsectionsareassigneddisjointsetsoflocksif(1)theyareconcurrentand(2)theydonotaccessanycommonlocation,oriftheyaccessacommonlocationthennoneofthemwritestothecommonlocation.(b)Twocriticalsectionsareassignedatleastonecommonlockif(1)theyareconcurrentand(2)theyaccesssomecommonlocationandatleastoneofthemwritestothecommonlocation.Notethatacriticalsectioncanbeassignedasetoflocks.Thesemanticsofalocksetfollowsthestricttwo-phaselockingpolicy[1].ThesolutionoftheMLAproblemconsistsoftwomainphases:theanalysisphaseandthelockassignmentphase.Intheanalysisphase,thecompilerreadsthemultithreadedprogramandstaticallydetermineswhetherapairofcriticalsectionsareinterfering.Twocriticalsectionsareinterferingiftheyareconcurrentandtheyaccesssomecommonsharedmemorylocation(s),withatleastoneofthemwritestothecommonlocation(s).Inthelockassignmentphasethecom

3 pilercalculatestheminimumnumberoflocksto
pilercalculatestheminimumnumberoflockstocontrolcriticalsectionsaccordingtotheanalysisresult,andassignsoneormorelockstoeachcriticalsection.Besides,theruntimesystemguaranteesthatanexecutionisdeadlockfreebyacquiringandreleasinglocksinapredeterminedorder.Theanalysisphaseissolvedbyconcurrencyanalysis,datasetanalysisandpointeranalysis.However,duetothespacelimitation,inthispaperwesimplyassumetheanalysisresultisalreadycalculated,andweonlyfocusonthelockassignmentphase.Readerscanreferto[2,1]formoredetailsontheanalysisphaseanddeadlockavoidanceoptions.InthefollowingdiscussionwerefertotheMLAproblemasthelockassignmentphaseexclusively,withoutanyfurtherclari cation.Therestofthepaperisorganizedasfollows.InSection2weintroducetheconcurrencygraphasthemaindatastructuretosolvetheMLAproblem.In Section3weprovethatMLAproblemisrelatedtothegraphcoloringproblemanditisNP-hard.WethenpresentaheuristictosolvetheMLAproblem.WealsoformulatetheMLAproblemasanIntegerLinearProgramming(ILP)problem.InSection4weevaluateourheuristicbycomparingitwithoptimalsolutionsproducedbythecommercialILPsolverCPLEX.In300randomlygeneratedtestingcasesweobservethatourMLAheuristicisoptimalfor83.3%ofthem.Wealsotesttheperformanceofourheuristicusinga10-waySun remachineonasetofSplash2[3]benchmarks,andobtainupto30.17%performancespeedupwithrespecttoprogramsinwhichallcriticalsectionsarecontrolledbyasinglelock.RelatedworkispresentedinSection5,and nallyweconcludeinSection6.2ConcurrencyGraphandCriticalSectionsInthissection,weintroducetheconcurrencygraphtomodelthepotentialcon-currencyandinterferenceamongcriticalsectionsinamultithreadedprogram.2.1ConcurrencyGraph (b)CS1 CS2 CS4 CS3 II{ x, y }{ z } Fig.1.(a)Exampleprogram(b)ConcurrencygraphDe nition1.AConcurrencyGraphisanundirectedgraphG=(V;E),inwhich:avertexv2Vdenotesatextualcriticalsection,andthereisanedge(u;v)2Eifinstancesofcriticalsectionsuandvmaybeconcurrent.Intheabovede nition,iftwoinstancesofthecriticalsectionuareconcurrent,wedonotintroduceaself-looponu,sincewewillassignatleastonelocktoeachcritic

4 alsection,andthemutualexclusionofuwithre
alsection,andthemutualexclusionofuwithrespecttoitselfisselfpreserved.Asanexample,Figure1(b)illustratestheconcurrencygraphfortheprogramshowninFigure1(a).ThesetofsharedmemorylocationsthatareaccessedwithincriticalsectionsarealsolistedwithincurlybracesinFigure1(b).Twoconcurrentcriticalsectionsaresaidtobenon-interferingifeithertheydonotaccessacommonlocationoriftheyaccessacommonlocationthennoneofthemwritestothecommonlocation.Twoconcurrentcriticalsections areinterferingiftheyaccesssomecommonlocationandatleastoneofthemwritestothecommonlocation.Weextendtheconcurrencygraphde nedinDe nition1bylabelinganedge(u;v)withlabelIwhencriticalsectionsuandvareinterfering,andwithlabelNwhenuandvarenon-interfering.Notethatageneralconcurrencygraphmaybeaforestofconnectedgraphs,andweanalyzeeachconnectedcomponentindependently.Inthefollowingdis-cussion,wesimplyassumethataconcurrencygraphGisaconnectedgraph.2.2Non-interferingConcurrencyGraphsConsideraclassofmultithreadedprogramsPnwhosecorrespondingconcurrencygraphcontainsonlynon-interferingedges.Sinceallincidentedgesofacriticalsectionarenon-interfering,itcannotshareanylockwithitsneighbors.Thisimpliesthatwhenevertwocriticalsectionsareconnected(concurrent),theyre-quiredi erentlocks.WecannowrephrasetheMLAproblem(Problem1)fornon-interferingconcurrencygraphsasfollows:Problem2.GivenaprogramwithasetVnofnon-interferingcriticalsections, ndtheminimumnumberoflocksthatcanbeassignedtocriticalsectionssuchthatiftwodi erentcriticalsectionsinVnareconcurrentthentheygetdi erentlocks.Theaboveproblemisequivalenttotheclassicalgraphcoloringproblem|colorthevertices(criticalsections)ofagraphusingtheminimumnumberofcolors(locks)suchthatnotwoadjacent(concurrent)vertices(criticalsections)aregiventhesamecolor(lock).TheMLAproblemforthisspecialclassofprogramsisNP-hard5.2.3InterferingConcurrencyGraphsConsideraclassofprogramsPi,forwhichtheconcurrencygraphcontainsonlyinterferingedges.Inthiscase,twocriticalsectionsareeitherconcurrentandinterfering,orarenotconcurrent(notconnected).Iftheyareconcurre

5 ntandinterfering,theyshouldshareatleasto
ntandinterfering,theyshouldshareatleastonecommonlocktopreservethemutualexclusion,whichimpliesthattheymustbeserialized.Iftheyarenotconcurrent,theyarealreadyserialized.Therefore,inthisinterferingspecialcase,thereisnoinherentconcurrency,sowecanuseasinglelocktocontrolallcriticalsectionswithoutintroducinganyperformancepenalty.2.4ConcurrencyGraphPartitionIngeneralcases,aconcurrencygraphcontainsbothnon-interferingedgesandinterferingedges.GivenaconcurrencygraphG=(V;E),letEndenotetheset 5Forcertainclassesofgraphs,suchastheintervalgraphs,thegraphcoloringproblemcanbesolvedinpolynomialtime.However,thegeneralconcurrencygraphsarenotnecessarilyintervalgraphs. (b)CS1 CS4 CS3 CS2 CS6 CS5 NN CS1 CS4 CS3 CS2 [ 1 ][ 1 ][ 1 ][ 2 ]NNNN CS1 CS4 CS3 CS2 NN CS1 CS4 CS3 CS2 CS6 CS5 NNN[ 3 ][ 1, 3 ][ 1 ][ 1, (3) ] CS1 CS4 CS3 CS2 CS6 CS5 NN CS1 CS4 CS3 CS2 NN[ 1 ][ 1 ][ 2 ][ 3 ] CS1 CS4 CS3 CS2 NN CS5 CS6 (f)(g)(h) Fig.2.(a)Ageneralconcurrencygraph(b)Thenon-interferingsubgraphGn(c)TheinterferingsubgraphGi(d)TheSNIGGsn(e)Thecrossingedges(doublelines),se-rializinginterferingedges(dottedlines),andtheinterferingsubgraph(indottedbox)(f)Aun-safeborrowingfromCS3toCS4(g)AsafeborrowingfromCS4toCS3(h)Finallockassignmentresultofnon-interferingedgesandEidenotethesetofinterferingedgesinG,suchthatE=En[EiandEn\Ei=;.LetGn=(Vn;En)bethenon-interferingsubgraphinducedbyEn,whereVnVsuchthatavertexvn2Vnhasatleastonenon-interferingedgeincidentonit.Figure2(b)illustratesthenon-interferingsubgraphofFigure2(a).LetGi=(Vi;E0i)betheinterferingsubgraphinducedbyverticesVi,whereVi=VVnandE0iEiisasetofinterferingedges(ui;vi)suchthatui;vi2Vi.Figure2(c)illustratestheinterferingsubgraphforFig-ure2(a).Finally,letE00i=EiE0ibeasetofinterferingedgesthatarenotinGi.SomeofinterferingedgesinE00iconnectverticesofthenon-interferingsubgraph,forexample,edges(CS1;CS3)and(CS3;CS4),asillustratedasbolddashedlinesinFigure2(d).Wecallsuchinterferingedgesthatoccurinsideanon-interferingsubgraphasserializinginterferingedgesEs,becausetheycould\serialize"theinheren

6 tconcurrencythatexistswithinnon-interfer
tconcurrencythatexistswithinnon-interferingsubgraph.Theremain-inginterferingedgesEci=E00iEsarecrossingedgesbetweenverticesinGnandGi.IntheexampleshowninFigure2(a),Eci=f(CS3;CS6);(CS4;CS6)g,illustratedasdoublesolidlinesin(e).Besidesthenon-interferingsubgraphGnandtheinterferingsubgraphGi,weintroducethenotionoftheserializingnon-interferencegraph(SNIG)asthenon-interferingsubgraphwithserializingedges,Gsn=(Vn;En[Es).Figure2(d)illustratesanexampleofSNIG.SNIGshavesomeinterestingpropertiesthatwillin\ruencethelockassignment. 2.5SerializingNon-InterferenceGraphLetusconsideraclassofconcurrencygraphscalledSerializingNon-InterferingGraphs(SNIGs).ASNIGconsistsofonlynon-interferingedgesandserializinginterferingedges(asde nedintheprevioussection).Serializinginterferingedgesconstraintheinherentconcurrencyinanon-interferingconcurrencygraph.TheyalsoconstraintheminimumnumberoflocksrequiredtocoloraSNIG.ThefollowingobservationstatesthatsometimesitisimpossibletocoloraSNIGifavertexcanbeassignedonlyonecolor.Observation1.ItisimpossibletocoloranarbitrarySNIGwiththefollowingcon\rictingconstraints:1.Eachvertexgetsonlyonecolor,2.Ifverticesuandvareconnectedbyanon-interferingedgethentheyaregiventwodi erentcolors,3.Iftwoverticesuandvareconnectedbyaserializinginterferingedgethentheyaregiventhesamecolor.ConsidertheSNIGinFigure3.Assumewesatisfyallaboveconstraints,thenallcriticalsectionsgetthesamelock,becausetheyareconnectedbyseri-alizinginterferingedges(CS1;CS3),(CS3;CS4)and(CS4;CS2).However,theconstraint(2)requiresthatCS1andCS2aregiventwodi erentcolors,acon-tradiction.ThereforeFigure3cannotsatisfyallthreeconstraints. CS1 CS4 CS3 CS2 NN Fig.3.ExampleSNIGforObservation1Therearetwowaystohandletheaboveimpossibility:relaxconstraint(1)intheaboveobservation,orrelaxconstraint(2).Byrelaxingconstraint(1)weareallowedtoassignmultiplecolorstoeachvertex.Byrelaxingconstraint(2)wewillreducetheconcurrency.Constraint(3)mustbesatis edsinceotherwisethemutualexclusionwillbeviolated.IntheMLAsolutionwewilltaketheapproachofassigningmultipl

7 elockssoastomaximizetheconcurrency.LetC(
elockssoastomaximizetheconcurrency.LetC(x)bethesetofcolorsthatareassignedtoavertexu,thecoloringproblemonSNIGisstatedasthefollowing:Problem3.GivenaSNIGGsn=(Vn;En[Es) ndtheminimumnumberofcolorstocolorGsnsuchthat:(a)Iftwoverticesuandvareconnectedbyanon-interferingedgethenC(u)\C(v)=;and(b)IftwoverticesuandvareconnectedbyaserializingedgethenC(u)\C(v)=;. LetGbeanarbitraryconcurrencygraph,andletGsnbetheSNIGofG.WewillshowinSection3.2thattheminimumnumberoflocksrequiredbyGequalstheminimumnumberoflocksrequiredbyGsn.3MinimumLockAssignmentSolutionTheMLAproblemforarbitraryconcurrencygraphsisNP-hardbecauseonespecialcase-MLAproblemfornon-interferingconcurrencygraph-isNP-hard.InthissectionwepresentaheuristicapproachforsolvingMLA.Wealsofor-mulatetheMLAproblemasanIntegerLinearProgramming(ILP)problem,andinSection4wewillusethisILPformulationtoquantitativelyevaluateourheuristic.3.1ANaiveSolutionAssumeallsharedmemorylocationsthatacriticalsectionaccessescanbestati-callyidenti edbycompileranalysis,thenasimplesolutiontotheMLAproblemistoassignadistinctlocktoeachsharedmemorylocation,andthelocksetofacriticalsectionisthesetoflocksassignedtomemorylocationsitaccesses.However,thisapproachmayusemorelocksthannecessary,andintroducemoreoverheadoflockacquisitionandrelease.Wesaythenumberoflocksrequiredinthissimplesolution,i.e.,thetotalnumberofmemorylocationsaccessedinaprogram,denotedasjMj,istheupperbound(UB)oftheoptimalMLAsolution.3.2MLAHeuristicOurMLAheuristicconsistsofthreemainsteps(seeFigure4):Step1:Assignlockstonon-interferingsubgraphGnusinggraphcoloringheuristic(Line6).Step2:EnsurethattheserializinginterferingedgesinSNIGarecorrectlyhandled(Line7).Step3:FinallypropagatethelockstotheinterferingsubgraphGi(Line8).The rststepisstraightforward.Weuseaheuristicgraphcoloringalgo-rithm[4]tocolorGn,andonepossiblesolutionforourexampleisshowninFigure2(b).Next,wemustensurethatcriticalsectionsconnectedbyserializinginterferingedgesinSNIGsarecorrectlyserialized.ThedetailsofthissteparegivenbythefunctionHandleSerializingEdgesinFigur

8 e4.InFigure2(d),CS1,CS3andCS4areinGnande
e4.InFigure2(d),CS1,CS3andCS4areinGnandeachofthemhasobtainedalockfromthegraphcoloring.InterferingcriticalsectionsCS1andCS3areautomaticallyserializedbysharinglock1,butCS3andCS4arenot.Astraightforwardmethodtosolvethisisletoneofthem\borrow"thelockfromtheother.Foraserializinginterferingedge(u;v),wesayvertexuborrowsthelockfromv,denotedasborrow(u v),ifuaddsv'slocktoitslockset,Lock(u)=Lock(u)[Lock(v).Denotethesetoflocksfromu'snon-interferingneighborsasNIN(u),NIN(u)=S(u;w)2GnLock(w).Beforethe LockAssignment(G)1.InitializeLock(u)forallu2Vasempty2.PartitionthegraphG3.ifGn=4.assignagloballocktoeachcriticalsection5.else6.HLB=GraphColoring(Gn)7.HandleSerializingEdges(Gsn)8.LockPropagation(Eci;Gi)9.endif10.ifHLB�jMjthen11.foreachv2V12.Lock(v)=Si2LS(v)Lock(i)13.endfor14.endifHandleSerializingEdges(Gsn)15.foreachserializinginterferingedges(u;v)16.ifLock(u)\Lock(v)=;17.ifborrow(u v)issafe18.Lock(u)=Lock(u)[Lock(v)19.elseifborrow(v u)issafe20.Lock(v)=Lock(v)[Lock(u)21.else22.HLB=HLB+123.addanewlocktouandv'slocksets24.endif25.endif26.endforLockPropagation(Eci;Gi)27.foreach(vn;vi)2Eci28.sequence=BreadthFirstSearch(Gi,vi)29.Arbitrarilypickonelocklfromvn'slockset30.foreachvinsequence31.Lock(v)=Lock(v)[flg32.endfor33.endfor Fig.4.LockAssignmentHeuristicborrowing,uhasadisjointsetoflockswithallitsnon-interferingneighbors,i.e.,Lock(u)\NIN(u)=;.Thisimpliesthattheconcurrencybetweenuanditsnon-interferingneighborsismaximized.Aftertheborrowing,wealsorequireunotshareanylockwithitsnon-interferingneighbors.Thisissatis edifLock(v)\NIN(u)=;,thatis,noneofu'snon-interferingneighborshasu'sborrowedlockfromv.Inthiscasewesaytheborrowingis\safe",whichmeansitdoesnotreduceconcurrencyamongnon-interferingcriticalsections. InourexampleinFigure2,inordertoenforcethemutualexclusionbetweenCS3andCS4,we rstletCS4borrowthelockfromCS3,thenLock(CS4)=f1;3g.ThisisshowninFigure2(f).However,thisborrowingisnotsafe,becauseoneofCS4'snon-interferingneighborCS1wouldsharelock1withit.Thenwetrythealternativeway.WeletCS3borrowthelockfromCS4.

9 ThisisillustratedinFigure2(g).Thisborrow
ThisisillustratedinFigure2(g).ThisborrowingissafebecauseLock(CS4)\NIN(CS3)=;,whereNIN(CS3)=f2g.Notethatifneitherborrowingissafe,wewillintroduceanewlockandaddittobothendvertices'locksets.TheprocedureoflockborrowingissummarizedinFigure4.The rsttwostepstogethercolortheSNIGGsn.Finally,infunctionLock-Propagation,wepropagatetheSNIGlockassignmentresulttotheinterferingsubgraphsGi.TheinterferingsubgraphGiisconnectedtothenon-interferingsubgraphGnthroughasetofcrossingedges(vn;vi),wherevn2Gn,andvi2Gi.Each(vn;vi)isaninterferingedge,thatmeansvishouldshareatleastoneofvn'slockobtainedfromthegraphcoloring.Wesayvn\propagate"alocktovi.Ifvihasmorethanoneincidentcrossingedges,thenitshouldinheritlocksfromallitsneighborsinGn.Subsequently,vipropagatesitslocksettoitsneighborsinGi.ThispropagationcontinuesuntileveryvertexinGiinheritslocksfromitsneighbors.Thisprocedurecanbesimplyimplementedasasetofbreath- rstsearches,witheachviatacrossingedgeasthesourcevertex.ThealgorithmisshowninFigure4.OnepropagationresultofourexampleisshowninFigure2(h).Animportantpropertyofthislockpropagationisthatitdoesnotintroduceanynewlock,thereforethenumberoflocksrequiredtocolorGicannotexceedthenumberoflocksrequiredtocolortheSNIGGsn.The nallockassignmentresultisshowninFigure2(h).WerefertothenumberoflocksrequiredtocolorGastheHeuristicLockBound(HLB).WehavementionedinthenaivesolutionthattheupperboundUBoftherequiredlocksisthenumberofsharedmemorylocationsaccessedintheconcurrencygraphG.InsomecasesHLBmightexceedUB,andweneedtochoosethesmalleronefromHLBandUBforlockassignment.TheMLAheuristicalgorithmissummarizedinFigure4.ThefollowingtheoremsshowthatourMLAheuristiccanpreservethemutualexclusionbetweencriticalsectionswithoutanylossofconcurrency.TheyalsoshowthatlockassignmentonanarbitraryconcurrencygraphGisoptimalifthelockassignmentonSNIGofGisoptimal.Detailedproofscanbefoundin[5].Theorem1.WhenthealgorithmLockAssignment(G)terminates,anypairofinterferingcriticalsectionsinGshareatleastonecommonlock.Theorem2.WhenthealgorithmLockAssignment(G)terminates,

10 anypairofnon-interferingcriticalsections
anypairofnon-interferingcriticalsectionsdonotshareanylock.Theorem3.LockassignmentonaconcurrencygraphGisoptimalifandonlyifthelockassignmentonitsSNIGGsnisoptimal.TheconcurrencygraphpartitioningrunsinO(V+E)time,andthegraphcoloringrunsinO(V2)time.Attheworstcase,thetimecomplexityofHan-dleSerializingEdgessandLockPropagationareO(EV)andO(E2+VE), respectively.Therefore,attheworstcasethetotaltimecomplexityofLockAs-signmentisO(E2+VE).3.3ILPFormulationInthissection,weformulatetheMLAproblemasanILPproblem.Givenacon-currencygraphG=(V;E),weintroduce0-1variablesfu;itoindicatewhetherlockiisassignedtonodeuinG,1ujVj,and1ijMj,whereMisthesetofsharedmemorylocationsthatareaccessedinallcriticalsections.RecallthatthenumberoflocksgivenbyanoptimalsolutioncannotexceedjMj.Sinceeachcriticalsectionmustbeassignedatleastonelock,wehavethefollowingconstraint:fu;1+fu;2++fu;jMj1forallu2G(1)Weuse0-1variableslitoindicatewhetherlockiisassignedtoanycriticalsection,li=f1;i_f2;i__fjVj;i.Thisconditionisrepresentedbythefollowingconstraints:f1;i++fjVj;ili(2)f1;i++fjVj;ijVjli(3)Nextwederiveconditionsthatensurethelockassignmentiscorrectandmaxi-mizestheparallelism.Recallthatalockassignmentsolutioniscorrectifinter-feringcriticalsectionsuandvsharesomelock,andparallelismismaximizedifnon-interferingcriticalsectionsareassignedtwodisjointsetsoflocks.Let0-1variablesu;v;iindicatewhetheruandvsharelocki,thensu;v;i=fu;i^fv;i.Thisconditionisimposedbythefollowingconstraints:fu;i+fv;i2su;v;i(4)fu;i+fv;i2su;v;i+1(5)Weuse0-1variablesu;vtoindicatewhetheruandvshareanylock.Thensu;v=su;v;1__su;v;jMj.Thefollowingtwoconstraintsrepresentthiscondition:su;v;1++su;v;jMjsu;v(6)su;v;1++su;v;jMjjMjsu;v(7)Thensu;v=1forinterferingedge(u;v)(8)su;v=0fornon-interferingedge(u;v)(9)Thetotalnumberoflocksusedis:N=l1++ljMj(10)Therefore,theMLAproblemistominimizeNsubjecttoinequalities(1)to(9). 4ExperimentalResultsInthissection,wepresenttwosetsofexperimentstoevaluateourlockassign-mentalgorithm.Inthe rstsetofexperiments,wecomparet

11 heresultsproducedbyourMLAheuristicwithth
heresultsproducedbyourMLAheuristicwiththeoptimalsolutionsbasedontheILPformulationonasetof300randomconcurrencygraphs.Inthesecondsetofexperimentweevaluatethee ectivenessoftheMLAheuristicusingSplash2[3]benchmarks.4.1PrecisionEvaluation Avg Min Max Vertices(V) 8.63 2 16 Edges(E) 16.73 1 53 EdgeDensityE=V2 0.19 0.09 0.28 Non-interferingedges(En) 3.37 0 20 En=E 0.22 0 1 Serializinginterferingedges 2.85 0 27 Table1.FeaturesofrandomconcurrencygraphsTostudytheprecisionofourMLAheuristicweimplementedourILPformu-lationinthecommercialILPsolverCPLEX,andtestedtheheuristicandtheILPformulationonasetof300randomlygeneratedconcurrencygraphswithcharacteristicsshowninTable1.Welimitedourrandomconcurrencygraphstocontainatmost16nodesduetotimeconstraintsintheILPsolver.Itshowsthatourheuristicsolutionisoptimalfor83.3%oftestedgraphs.Fortheremaining16.7%ofgraphsourheuristicassignsmorelocksthantheoptimalsolutions,andintheworstcaseatmosttwomorelocksthanoptimalsolutionsareassigned.Wealsoevaluatedthein\ruenceofnon-interferingsubgraphGnandserializinginterferingedgesEsforlockassignment.Forthispurpose,welistedthepreci-sionoftheMLAheuristicwiththeincreaseoftherelativesizeofnon-interferingsubgraph,givenbyVn=V,andwiththeincreaseoftherelativenumberofseri-alizinginterferingedges,givenbyEs=E,inFigure5(a)and(b),respectively.Asanexample,Figure5(a)showsthatourMLAheuristicgivesoptimalsolutionstoabout70%oftestcasesthathaveVn=V=0:6andsub-optimalsolutions(i.e.,assignextralocks)fortheremaining30%.Figure5(a)and(b)illustratethattheprecisionofourheuristicdependsonthenon-interferingsubgraphsizeandtherelativenumberofserializinginterferingedges.4.2PerformanceStudyonSun-FireNextwestudytheperformanceoftheMLAheuristicusingasetofSplash2[3]benchmarkslistedinTable2.Splash2benchmarkscallthePthreadsli- 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Vn / VMLA Optimal Un-optimal -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Es / EMLA Optimal Un-optimal (a) MLA Vn / V(b) MLA Es / E Fig.5.Precisionofthe

12 MLAheuristic Application Barnes Cholesky
MLAheuristic Application Barnes Cholesky Ocean-cont Radiosity Water-nsq Description N-body Matrix Hydro- 3-D Water factoring dynamics rendering molecules Problemsize 262144 tk29.O 514514 largeroom 512 bodies B8C256 batch molecules CSs 6 7 4 37 9 CStime(1proc) 6.29% 32.37% 0.11% 9.93% 11.54% Linesofcode 17.17/68 10.86/37 1.75/3 12.79/85 2.89/6 inCS(avg/max) FuncsinCS 1 1 0 10 1 Locksassigned 3 4 4 8 7 Locksforeach 1 1 1 4 1 CS(max) Table2.Benchmarksandlockassignmentresultsbrary6,andmutualexclusionisenforcedbypthread mutex lock(lock var�)andpthread mutex unlock(lock var�)functionswithexplicitlockvariables.Forthepurposeofourperformancestudy,wemanuallytransformedeachlock/unlockregionintoacriticalsection.Weconstructedtheconcurrencygraphforeachbenchmarkmanually,andappliedtheMLAalgorithmtocalculatethelockas-signment.ThenumberoflocksassignedtoeachbenchmarkisshowninTable2.WethenranthesetofbenchmarksonSun re10-processor750MHzmachine,andcollectedtwosetsofdataforeachbenchmarktoevaluateourheuristics:(1)Ts:theexecutiontimeofthebenchmarkwhenallcriticalsectionsarecon-trolledbyasinglelock,and(2)TMLA:theexecutiontimeofthebenchmarkwithlockassignmentusingourMLAheuristic.Figure6showstheperformanceimprovementofourlockassignmentwithrespecttothesinglegloballock,i.e.,(TsTMLA)=Ts,runningondi erentnumberofthreads.CholeskyandRadiosity 6TheoriginalSplash2benchmarksutilizetheArgonneNationalLaboratories(ANL)parmacsmacrosforparallelconstructs.Wehavere-con guredthemtocallthePthreadslibrary. Fig.6.Performanceimprovementwithrespecttosinglelockhaveshownaperformanceimprovementof30.17%and14.76%,respectively,duetothedecreaseoflockcontentionandserialization.Ontheotherhand,Barnes,Ocean-contandWater-nsqshowamuchlowerperformanceimprovementfortwomainreasons.First,theamountoftimespentoncriticalsectionsisasmallportionofthetotalexecutiontime.Forinstance,asshowninTable2,forOcean-cont,thetimespentoncriticalsectionstakesonly0.11%ofthetotalexecutiontime.Second,inBarnesandWater-nsqdataisoftenorganizedasarraysorcompl

13 icateduser-de neddatastructures,andisacc
icateduser-de neddatastructures,andisaccessedinadynamicpatternthatcannotbepredictedduringthecompilationtime.Whenweconstructedtheconcurrencygraphsweconservativelytreatedsucharraysanduser-de neddatastructuresasscalarunits.Thisconservativeapproachmayintroduce\spurious"interferenceamongcriticalsections,whichresultsinunnecessaryserialization.Theunnecessaryserializationwillthenincreaselockcontentionamongthreadsduringtheexecutiontime.Somemoresophisticatedanalysistechniquessuchasshapeanalysis[6],ordynamiccon\rictresolvingtechniquessuchastransactionalmemoryandsynchronizationstatebu er(SSB)[7]areneededtoexploitfurtherconcurrencyamongcriticalsectionsinthesebenchmarks.5RelatedWorkRecentlytherehasbeensomeworkoncompilerbasedlockinferencetechnique.Emmiet.al.[8]proposealockallocationproblemthattakesamultithreadedprogramannotatedwithatomicsectionsandinfersalockassignmenttoatomicsectionstopreserveitsatomicityanddeadlockfreedom.TheyformulatethelockallocationproblemasanILPproblemwhichminimizesthecon\rictcostbetweenatomicsectionsandminimizesthenumberoflocks.Noheuristicsolution ispresentedintheirwork.Ourlockassignmentdi ersfromlockallocationinthefollowingtwoaspects.First,ourlockassignmentproblemmaximizestheparallelismamongcriticalsectionsusingtheminimumnumberoflocks,whilethelockallocationproblemusestheminimumnumberoflockstominimizethecon\rictcost,ametricthatisnotclearlyrelatedwiththeparallelism.Second,wepresentboththeheuristicsolutionandtheILPformulationforlockassignmentproblem.WeusetheILPformulationtoevaluatetheoptimalityofthelockassignmentheuristic.Wealsousescienti capplicationstoevaluatethelockassignmentheuristicandpresentperformanceimprovement.Hickset.al.[9]hasproposedalockinferencetechniquesforatomicsections,which rstdeterminesasetofsharedmemorylocationsintheprogram,thenusesa\mutexinference"algorithmtoinferasetoflocksforeachatomicsectiontopreserveitsatomicity.Thebasicideaoftheirmutexinferencealgorithmisto ndthedependencerelationamongsharedmemorylocations,andpartitionthesharedmemorylocationsintoset

14 saccordingtothisdependencerelation.Locks
saccordingtothisdependencerelation.Locksarethenassignedtoeachmemorylocationset.Sincethemutexinfer-encealgorithmisnotoptimizationbased,itmayinfermorelocksthanourlockassignmentalgorithm.Autolocker[10]takestheprogramsannotatedwithpessimisticatomicsec-tionsandaprogrammercontrolledlockassignment,andinfersacompilercon-trolledlockassignmentthatisfreeofdeadlocksanddataraces.Vaziriet.al.[11]proposedadata-centricsynchronizationapproachforwrit-ingconcurrentprogramsusingatomicsets,whichareasetofsharedmemorylocationsthathave\similar"dataconsistencyproperties.Accessesto eldsinanatomicsetareassumedtotakeplaceatomicallyin\unitsofwork".Takenaprogramwithannotatedatomicsets,thecompilerinfersunitsofworkauto-maticallyandtranslatesthemintosynchronizedblocks.OurworkcomplementsVaziriet.al.'sworkinthatwecananalyzeanddeterminetheatomicsetsandunitsofworkusingconcurrencyanalysisandlockassignmentalgorithm.Someotheroptimizationtechniquesonlockshavebeenreported.DinizandRinard[12]presentdatalockcoarseningandcomputationlockcoarseningtech-niquestoreducetheoverheadof ne-grainlocksinJavaprograms.Choiet.al.[13]andAldrichet.al.[14]removeunnecessarysynchronizationfromJavaprograms.6ConclusionsInthispaperweproposedalockassignmenttechniquetosimplifythemu-tualexclusioninmultithreadedprograms.Ittakestheprogramsannotatedwithcriticalsectionsand ndstheminimumnumberoflocksneededtoenforcemutualexclusionamonginterferingcriticalsectionswithoutanylossofconcur-rency.Experimentalresultsareveryencouragingandshowthatourmethodcanbeusedtoimprovetheperformanceofmultithreadedprogramswithmutualex-clusionbyexploitingconcurrencyamongmultiplecriticalsections.Anextensionofthisworktosupportread/writelocksisasubjectforfuturework. References1.J.GrayandA.Reuter.TransactionProcessing:ConceptsandTechniques.MorganKaufmann,1993.2.V.Sreedhar,Y.Zhang,andG.Gao.Anewframeworkforanalysisandoptimizationofsharedmemoryparallelprograms.TechnicalReportCAPSL-TM-063,UniversityofDelaware,Newark,DE,2005.3.TheStanfordFLASHPrjoect.Stanfordparallelapplicationsforsharedm

15 emory(SPLASH)benchmark.Inhttp://www-\ras
emory(SPLASH)benchmark.Inhttp://www-\rash.stanford.edu/apps/SPLASH/.4.P.Briggs.RegisterAllocationviaGraphColoring.PhDthesis,RiceUniversity,1992.5.Y.Zhang,V.Sreedhar,W.Zhu,V.Sarkar,andG.Gao.Optimizedlockassign-mentandallocation:Amethodforexploitingconcurrencyamongcriticalsections.TechnicalReportCAPSL-TM-065-revised,UniversityofDelaware,Newark,DE,2007.6.RakeshGhiyaandLaurieJ.Hendren.Isitatree,adag,oracyclicgraph?ashapeanalysisforheap-directedpointersinc.InPOPL'96:Proceedingsofthe23rdACMSIGPLAN-SIGACTsymposiumonPrinciplesofprogramminglanguages,pages1{15,1996.7.WeirongZhu,VugranamCSreedhar,ZiangHu,andGuangR.Gao.Synchro-nizationstatebu er:supportingecient ne-grainsynchronizationonmany-corearchitectures.InISCA'07:Proceedingsofthe34thannualinternationalsympo-siumonComputerarchitecture,pages35{45,2007.8.MichaelEmmi,Je reyS.Fischer,RanjitJhala,andRupakMajumdar.Lockallocation.InPOPL'07:Proceedingsofthe34thannualACMSIGPLAN-SIGACTsymposiumonPrinciplesofprogramminglanguages,pages291{296,2007.9.M.Hicks,J.Foster,andP.Pratikakis.Lockinferenceforatomicsections.InTRANSACT'06:Proceedingsofthe1stACMSIGPLANWorkshoponLanguages,Compilers,andHardwareSupportforTransactionalComputing,2006.10.BillMcCloskey,FengZhou,DavidGay,andEricBrewer.Autolocker:synchro-nizationinferenceforatomicsections.InPOPL'06:Conferencerecordofthe33rdACMSIGPLAN-SIGACTsymposiumonPrinciplesofprogramminglanguages,pages346{358,2006.11.M.Vaziri,F.Tip,andJ.Dolby.Associatingsynchronizationconstraintswithdatainanobject-orientedlanguage.InPOPL'06,pages334{345.ACM,2006.12.P.DinizandM.Rinard.Lockcoarsening:Eliminatinglockoverheadinautomati-callyparallelizedobject-basedprograms.InLCPC'96,pages284{299,1996.13.Jong-DeokChoi,ManishGupta,MauricioJ.Serrano,VugranamC.Sreedhar,andSamuelP.Midki .Stackallocationandsynchronizationoptimizationsforjavausingescapeanalysis.ACMTrans.Program.Lang.Syst.,25(6):876{910,2003.14.J.Aldrich,E.Sirer,C.Chambers,andS.Eggers.Comprehensivesynchronizationeliminationforjava.ScienceofComputerProgramming,47(2-3):91{120,2003.

Related Contents


Next Show more