TheauthorparticipatedthisworkwhenhewasagraduatestudentintheUniversityofDelawareInthispaperweproposealockassignmenttechniquetosimplifytheenforcementofmutualexclusioninmultithreadedprogramsWeallowthepr ID: 897881
Download Pdf The PPT/PDF document "MinimumLockAssignmentAMethodforExploitin..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1 MinimumLockAssignment:AMethodforExploiti
MinimumLockAssignment:AMethodforExploitingConcurrencyamongCriticalSectionsYuanZhang1,VugranamC.Sreedhar2,WeirongZhu3??,VivekSarkar4,andGuangR.Gao11UniversityofDelaware,Newark,DE,fzhangy,ggaog@capsl.udel.edu2IBMT.J.WatsonResearchCenter,Hawthorne,NY,vugranam@us.ibm.com3MicrosoftCorporation,Seattle,WA,weirong.zhu@microsoft.com4RiceUniversity,Houston,TX,vsarkar@rice.eduAbstract.Inthispaperweproposealockassignmenttechniquetosim-plifythemutualexclusionenforcementinmultithreadedprograms.Pro-grammersareallowedtoannotatetheregionsofcodethatareexpectedtobemutuallyexclusiveascriticalsections,withoutusingexplicitlocks.Thecompilerthenautomaticallyinfersanassignmentofthemin-imumnumberoflockstocriticalsectionsbysolvingtheMinimumLockAssignment(MLA)problemsoastoenforcemutualexclusionwithoutanylossofconcurrency.WeshowthattheMLAproblemisNP-hard.WehaveproposedaheuristictosolvetheMLAproblem,andtestedtheoptimalityoftheheuristicwiththeIntegerLinearProgramming(ILP)solver.Wehavealsotestedtheeciencyoftheheuristicusingscienticapplications,fromwhichweobtainupto30%performancegainwithrespecttotheprogramsinwhichallcriticalsectionsarecontrolledbyasinglelock.1IntroductionGiventhattheprocessorsincurrentandfuturecomputersystemsarebecom-ingmulti-ormany-corebydefault,itisimportanttoaddresstheperformanceandproductivityissuesinmultithreadedprogramming.Oneofthemajorper-formanceandproductivityissuesinmultithreadedprogrammingarisesfromen-forcingthemutualexclusion(mutexforshort)usinglock/unlockoperations.Programmersexplicitlyassignlockvariablestocontrolmutexregions,andthelockvariablesareacquiredbytheexecutingthreadbeforethemutexregionisexecuted,andarereleasedaftertheexecutionofthemutexregioncompletes.Explicitlymanagingmultiplelocksiserrorpronesinceitiseasyforprogram-merstointroducedataracesandcreatedeadlocks.Alternatively,programmerscanuseasinglelocktocontrolallmutexregionstoavoiddeadlocksanddataraces.However,theyloseconcurrencyamongmutexregionsbyunnecessarilyserializingthem. ??Theauthorparticipatedthisworkwhenhewasagrad
2 uatestudentintheUniversityofDelaware Int
uatestudentintheUniversityofDelaware Inthispaper,weproposealockassignmenttechniquetosimplifytheenforce-mentofmutualexclusioninmultithreadedprograms.Weallowtheprogrammerstoannotateregionsofcodethatareexpectedtobeexecutedmutuallyexclusivelyascriticalsections,withoutmanagingexplicitlocks.Thecompilerthenau-tomaticallyinfersanassignmentofmultiplecompiler-managedlockstocriticalsections(possiblymultiplelocksforonecriticalsection)topreservethemutualexclusionandalsoexploittheconcurrencyamongcriticalsections.Anaivelockassignmentapproachassociatesonelocktoeachsharedmem-orylocation,andthelocksetofacriticalsectionisthesetoflocksassignedtomemorylocationsitaccesses.Thisapproach,however,mayusemorelocksthannecessary,andintroduceexcessiveoverheadonlockacquisitionandre-lease.Tocontrolthelockingoverhead,wewouldusetheminimumnumberoflockswhichisnecessarytopreservethemutualexclusionandfullyexploittheconcurrencybetweencriticalsections.WeformulatethislockassignmenttaskastheMinimumLockAssignment(MLA)problem:Problem1(MinimumLockAssignment).Givenamultithreadedprogramwithasetofcriticalsections,ndtheminimumnumberofdistinctlocksthatareneededforcontrollingthecriticalsectionssuchthat(a)Twocriticalsectionsareassigneddisjointsetsoflocksif(1)theyareconcurrentand(2)theydonotaccessanycommonlocation,oriftheyaccessacommonlocationthennoneofthemwritestothecommonlocation.(b)Twocriticalsectionsareassignedatleastonecommonlockif(1)theyareconcurrentand(2)theyaccesssomecommonlocationandatleastoneofthemwritestothecommonlocation.Notethatacriticalsectioncanbeassignedasetoflocks.Thesemanticsofalocksetfollowsthestricttwo-phaselockingpolicy[1].ThesolutionoftheMLAproblemconsistsoftwomainphases:theanalysisphaseandthelockassignmentphase.Intheanalysisphase,thecompilerreadsthemultithreadedprogramandstaticallydetermineswhetherapairofcriticalsectionsareinterfering.Twocriticalsectionsareinterferingiftheyareconcurrentandtheyaccesssomecommonsharedmemorylocation(s),withatleastoneofthemwritestothecommonlocation(s).Inthelockassignmentphasethecom
3 pilercalculatestheminimumnumberoflocksto
pilercalculatestheminimumnumberoflockstocontrolcriticalsectionsaccordingtotheanalysisresult,andassignsoneormorelockstoeachcriticalsection.Besides,theruntimesystemguaranteesthatanexecutionisdeadlockfreebyacquiringandreleasinglocksinapredeterminedorder.Theanalysisphaseissolvedbyconcurrencyanalysis,datasetanalysisandpointeranalysis.However,duetothespacelimitation,inthispaperwesimplyassumetheanalysisresultisalreadycalculated,andweonlyfocusonthelockassignmentphase.Readerscanreferto[2,1]formoredetailsontheanalysisphaseanddeadlockavoidanceoptions.InthefollowingdiscussionwerefertotheMLAproblemasthelockassignmentphaseexclusively,withoutanyfurtherclarication.Therestofthepaperisorganizedasfollows.InSection2weintroducetheconcurrencygraphasthemaindatastructuretosolvetheMLAproblem.In Section3weprovethatMLAproblemisrelatedtothegraphcoloringproblemanditisNP-hard.WethenpresentaheuristictosolvetheMLAproblem.WealsoformulatetheMLAproblemasanIntegerLinearProgramming(ILP)problem.InSection4weevaluateourheuristicbycomparingitwithoptimalsolutionsproducedbythecommercialILPsolverCPLEX.In300randomlygeneratedtestingcasesweobservethatourMLAheuristicisoptimalfor83.3%ofthem.Wealsotesttheperformanceofourheuristicusinga10-waySunremachineonasetofSplash2[3]benchmarks,andobtainupto30.17%performancespeedupwithrespecttoprogramsinwhichallcriticalsectionsarecontrolledbyasinglelock.RelatedworkispresentedinSection5,andnallyweconcludeinSection6.2ConcurrencyGraphandCriticalSectionsInthissection,weintroducetheconcurrencygraphtomodelthepotentialcon-currencyandinterferenceamongcriticalsectionsinamultithreadedprogram.2.1ConcurrencyGraph (b)CS1 CS2 CS4 CS3 II{ x, y }{ z } Fig.1.(a)Exampleprogram(b)ConcurrencygraphDenition1.AConcurrencyGraphisanundirectedgraphG=(V;E),inwhich:avertexv2Vdenotesatextualcriticalsection,andthereisanedge(u;v)2Eifinstancesofcriticalsectionsuandvmaybeconcurrent.Intheabovedenition,iftwoinstancesofthecriticalsectionuareconcurrent,wedonotintroduceaself-looponu,sincewewillassignatleastonelocktoeachcritic
4 alsection,andthemutualexclusionofuwithre
alsection,andthemutualexclusionofuwithrespecttoitselfisselfpreserved.Asanexample,Figure1(b)illustratestheconcurrencygraphfortheprogramshowninFigure1(a).ThesetofsharedmemorylocationsthatareaccessedwithincriticalsectionsarealsolistedwithincurlybracesinFigure1(b).Twoconcurrentcriticalsectionsaresaidtobenon-interferingifeithertheydonotaccessacommonlocationoriftheyaccessacommonlocationthennoneofthemwritestothecommonlocation.Twoconcurrentcriticalsections areinterferingiftheyaccesssomecommonlocationandatleastoneofthemwritestothecommonlocation.WeextendtheconcurrencygraphdenedinDenition1bylabelinganedge(u;v)withlabelIwhencriticalsectionsuandvareinterfering,andwithlabelNwhenuandvarenon-interfering.Notethatageneralconcurrencygraphmaybeaforestofconnectedgraphs,andweanalyzeeachconnectedcomponentindependently.Inthefollowingdis-cussion,wesimplyassumethataconcurrencygraphGisaconnectedgraph.2.2Non-interferingConcurrencyGraphsConsideraclassofmultithreadedprogramsPnwhosecorrespondingconcurrencygraphcontainsonlynon-interferingedges.Sinceallincidentedgesofacriticalsectionarenon-interfering,itcannotshareanylockwithitsneighbors.Thisimpliesthatwhenevertwocriticalsectionsareconnected(concurrent),theyre-quiredierentlocks.WecannowrephrasetheMLAproblem(Problem1)fornon-interferingconcurrencygraphsasfollows:Problem2.GivenaprogramwithasetVnofnon-interferingcriticalsections,ndtheminimumnumberoflocksthatcanbeassignedtocriticalsectionssuchthatiftwodierentcriticalsectionsinVnareconcurrentthentheygetdierentlocks.Theaboveproblemisequivalenttotheclassicalgraphcoloringproblem|colorthevertices(criticalsections)ofagraphusingtheminimumnumberofcolors(locks)suchthatnotwoadjacent(concurrent)vertices(criticalsections)aregiventhesamecolor(lock).TheMLAproblemforthisspecialclassofprogramsisNP-hard5.2.3InterferingConcurrencyGraphsConsideraclassofprogramsPi,forwhichtheconcurrencygraphcontainsonlyinterferingedges.Inthiscase,twocriticalsectionsareeitherconcurrentandinterfering,orarenotconcurrent(notconnected).Iftheyareconcurre
5 ntandinterfering,theyshouldshareatleasto
ntandinterfering,theyshouldshareatleastonecommonlocktopreservethemutualexclusion,whichimpliesthattheymustbeserialized.Iftheyarenotconcurrent,theyarealreadyserialized.Therefore,inthisinterferingspecialcase,thereisnoinherentconcurrency,sowecanuseasinglelocktocontrolallcriticalsectionswithoutintroducinganyperformancepenalty.2.4ConcurrencyGraphPartitionIngeneralcases,aconcurrencygraphcontainsbothnon-interferingedgesandinterferingedges.GivenaconcurrencygraphG=(V;E),letEndenotetheset 5Forcertainclassesofgraphs,suchastheintervalgraphs,thegraphcoloringproblemcanbesolvedinpolynomialtime.However,thegeneralconcurrencygraphsarenotnecessarilyintervalgraphs. (b)CS1 CS4 CS3 CS2 CS6 CS5 NN CS1 CS4 CS3 CS2 [ 1 ][ 1 ][ 1 ][ 2 ]NNNN CS1 CS4 CS3 CS2 NN CS1 CS4 CS3 CS2 CS6 CS5 NNN[ 3 ][ 1, 3 ][ 1 ][ 1, (3) ] CS1 CS4 CS3 CS2 CS6 CS5 NN CS1 CS4 CS3 CS2 NN[ 1 ][ 1 ][ 2 ][ 3 ] CS1 CS4 CS3 CS2 NN CS5 CS6 (f)(g)(h) Fig.2.(a)Ageneralconcurrencygraph(b)Thenon-interferingsubgraphGn(c)TheinterferingsubgraphGi(d)TheSNIGGsn(e)Thecrossingedges(doublelines),se-rializinginterferingedges(dottedlines),andtheinterferingsubgraph(indottedbox)(f)Aun-safeborrowingfromCS3toCS4(g)AsafeborrowingfromCS4toCS3(h)Finallockassignmentresultofnon-interferingedgesandEidenotethesetofinterferingedgesinG,suchthatE=En[EiandEn\Ei=;.LetGn=(Vn;En)bethenon-interferingsubgraphinducedbyEn,whereVnVsuchthatavertexvn2Vnhasatleastonenon-interferingedgeincidentonit.Figure2(b)illustratesthenon-interferingsubgraphofFigure2(a).LetGi=(Vi;E0i)betheinterferingsubgraphinducedbyverticesVi,whereVi=V VnandE0iEiisasetofinterferingedges(ui;vi)suchthatui;vi2Vi.Figure2(c)illustratestheinterferingsubgraphforFig-ure2(a).Finally,letE00i=Ei E0ibeasetofinterferingedgesthatarenotinGi.SomeofinterferingedgesinE00iconnectverticesofthenon-interferingsubgraph,forexample,edges(CS1;CS3)and(CS3;CS4),asillustratedasbolddashedlinesinFigure2(d).Wecallsuchinterferingedgesthatoccurinsideanon-interferingsubgraphasserializinginterferingedgesEs,becausetheycould\serialize"theinheren
6 tconcurrencythatexistswithinnon-interfer
tconcurrencythatexistswithinnon-interferingsubgraph.Theremain-inginterferingedgesEci=E00i EsarecrossingedgesbetweenverticesinGnandGi.IntheexampleshowninFigure2(a),Eci=f(CS3;CS6);(CS4;CS6)g,illustratedasdoublesolidlinesin(e).Besidesthenon-interferingsubgraphGnandtheinterferingsubgraphGi,weintroducethenotionoftheserializingnon-interferencegraph(SNIG)asthenon-interferingsubgraphwithserializingedges,Gsn=(Vn;En[Es).Figure2(d)illustratesanexampleofSNIG.SNIGshavesomeinterestingpropertiesthatwillin\ruencethelockassignment. 2.5SerializingNon-InterferenceGraphLetusconsideraclassofconcurrencygraphscalledSerializingNon-InterferingGraphs(SNIGs).ASNIGconsistsofonlynon-interferingedgesandserializinginterferingedges(asdenedintheprevioussection).Serializinginterferingedgesconstraintheinherentconcurrencyinanon-interferingconcurrencygraph.TheyalsoconstraintheminimumnumberoflocksrequiredtocoloraSNIG.ThefollowingobservationstatesthatsometimesitisimpossibletocoloraSNIGifavertexcanbeassignedonlyonecolor.Observation1.ItisimpossibletocoloranarbitrarySNIGwiththefollowingcon\rictingconstraints:1.Eachvertexgetsonlyonecolor,2.Ifverticesuandvareconnectedbyanon-interferingedgethentheyaregiventwodierentcolors,3.Iftwoverticesuandvareconnectedbyaserializinginterferingedgethentheyaregiventhesamecolor.ConsidertheSNIGinFigure3.Assumewesatisfyallaboveconstraints,thenallcriticalsectionsgetthesamelock,becausetheyareconnectedbyseri-alizinginterferingedges(CS1;CS3),(CS3;CS4)and(CS4;CS2).However,theconstraint(2)requiresthatCS1andCS2aregiventwodierentcolors,acon-tradiction.ThereforeFigure3cannotsatisfyallthreeconstraints. CS1 CS4 CS3 CS2 NN Fig.3.ExampleSNIGforObservation1Therearetwowaystohandletheaboveimpossibility:relaxconstraint(1)intheaboveobservation,orrelaxconstraint(2).Byrelaxingconstraint(1)weareallowedtoassignmultiplecolorstoeachvertex.Byrelaxingconstraint(2)wewillreducetheconcurrency.Constraint(3)mustbesatisedsinceotherwisethemutualexclusionwillbeviolated.IntheMLAsolutionwewilltaketheapproachofassigningmultipl
7 elockssoastomaximizetheconcurrency.LetC(
elockssoastomaximizetheconcurrency.LetC(x)bethesetofcolorsthatareassignedtoavertexu,thecoloringproblemonSNIGisstatedasthefollowing:Problem3.GivenaSNIGGsn=(Vn;En[Es)ndtheminimumnumberofcolorstocolorGsnsuchthat:(a)Iftwoverticesuandvareconnectedbyanon-interferingedgethenC(u)\C(v)=;and(b)IftwoverticesuandvareconnectedbyaserializingedgethenC(u)\C(v)=;. LetGbeanarbitraryconcurrencygraph,andletGsnbetheSNIGofG.WewillshowinSection3.2thattheminimumnumberoflocksrequiredbyGequalstheminimumnumberoflocksrequiredbyGsn.3MinimumLockAssignmentSolutionTheMLAproblemforarbitraryconcurrencygraphsisNP-hardbecauseonespecialcase-MLAproblemfornon-interferingconcurrencygraph-isNP-hard.InthissectionwepresentaheuristicapproachforsolvingMLA.Wealsofor-mulatetheMLAproblemasanIntegerLinearProgramming(ILP)problem,andinSection4wewillusethisILPformulationtoquantitativelyevaluateourheuristic.3.1ANaiveSolutionAssumeallsharedmemorylocationsthatacriticalsectionaccessescanbestati-callyidentiedbycompileranalysis,thenasimplesolutiontotheMLAproblemistoassignadistinctlocktoeachsharedmemorylocation,andthelocksetofacriticalsectionisthesetoflocksassignedtomemorylocationsitaccesses.However,thisapproachmayusemorelocksthannecessary,andintroducemoreoverheadoflockacquisitionandrelease.Wesaythenumberoflocksrequiredinthissimplesolution,i.e.,thetotalnumberofmemorylocationsaccessedinaprogram,denotedasjMj,istheupperbound(UB)oftheoptimalMLAsolution.3.2MLAHeuristicOurMLAheuristicconsistsofthreemainsteps(seeFigure4):Step1:Assignlockstonon-interferingsubgraphGnusinggraphcoloringheuristic(Line6).Step2:EnsurethattheserializinginterferingedgesinSNIGarecorrectlyhandled(Line7).Step3:FinallypropagatethelockstotheinterferingsubgraphGi(Line8).Therststepisstraightforward.Weuseaheuristicgraphcoloringalgo-rithm[4]tocolorGn,andonepossiblesolutionforourexampleisshowninFigure2(b).Next,wemustensurethatcriticalsectionsconnectedbyserializinginterferingedgesinSNIGsarecorrectlyserialized.ThedetailsofthissteparegivenbythefunctionHandleSerializingEdgesinFigur
8 e4.InFigure2(d),CS1,CS3andCS4areinGnande
e4.InFigure2(d),CS1,CS3andCS4areinGnandeachofthemhasobtainedalockfromthegraphcoloring.InterferingcriticalsectionsCS1andCS3areautomaticallyserializedbysharinglock1,butCS3andCS4arenot.Astraightforwardmethodtosolvethisisletoneofthem\borrow"thelockfromtheother.Foraserializinginterferingedge(u;v),wesayvertexuborrowsthelockfromv,denotedasborrow(u v),ifuaddsv'slocktoitslockset,Lock(u)=Lock(u)[Lock(v).Denotethesetoflocksfromu'snon-interferingneighborsasNIN(u),NIN(u)=S(u;w)2GnLock(w).Beforethe LockAssignment(G)1.InitializeLock(u)forallu2Vasempty2.PartitionthegraphG3.ifGn=4.assignagloballocktoeachcriticalsection5.else6.HLB=GraphColoring(Gn)7.HandleSerializingEdges(Gsn)8.LockPropagation(Eci;Gi)9.endif10.ifHLBjMjthen11.foreachv2V12.Lock(v)=Si2LS(v)Lock(i)13.endfor14.endifHandleSerializingEdges(Gsn)15.foreachserializinginterferingedges(u;v)16.ifLock(u)\Lock(v)=;17.ifborrow(u v)issafe18.Lock(u)=Lock(u)[Lock(v)19.elseifborrow(v u)issafe20.Lock(v)=Lock(v)[Lock(u)21.else22.HLB=HLB+123.addanewlocktouandv'slocksets24.endif25.endif26.endforLockPropagation(Eci;Gi)27.foreach(vn;vi)2Eci28.sequence=BreadthFirstSearch(Gi,vi)29.Arbitrarilypickonelocklfromvn'slockset30.foreachvinsequence31.Lock(v)=Lock(v)[flg32.endfor33.endfor Fig.4.LockAssignmentHeuristicborrowing,uhasadisjointsetoflockswithallitsnon-interferingneighbors,i.e.,Lock(u)\NIN(u)=;.Thisimpliesthattheconcurrencybetweenuanditsnon-interferingneighborsismaximized.Aftertheborrowing,wealsorequireunotshareanylockwithitsnon-interferingneighbors.ThisissatisedifLock(v)\NIN(u)=;,thatis,noneofu'snon-interferingneighborshasu'sborrowedlockfromv.Inthiscasewesaytheborrowingis\safe",whichmeansitdoesnotreduceconcurrencyamongnon-interferingcriticalsections. InourexampleinFigure2,inordertoenforcethemutualexclusionbetweenCS3andCS4,werstletCS4borrowthelockfromCS3,thenLock(CS4)=f1;3g.ThisisshowninFigure2(f).However,thisborrowingisnotsafe,becauseoneofCS4'snon-interferingneighborCS1wouldsharelock1withit.Thenwetrythealternativeway.WeletCS3borrowthelockfromCS4.
9 ThisisillustratedinFigure2(g).Thisborrow
ThisisillustratedinFigure2(g).ThisborrowingissafebecauseLock(CS4)\NIN(CS3)=;,whereNIN(CS3)=f2g.Notethatifneitherborrowingissafe,wewillintroduceanewlockandaddittobothendvertices'locksets.TheprocedureoflockborrowingissummarizedinFigure4.ThersttwostepstogethercolortheSNIGGsn.Finally,infunctionLock-Propagation,wepropagatetheSNIGlockassignmentresulttotheinterferingsubgraphsGi.TheinterferingsubgraphGiisconnectedtothenon-interferingsubgraphGnthroughasetofcrossingedges(vn;vi),wherevn2Gn,andvi2Gi.Each(vn;vi)isaninterferingedge,thatmeansvishouldshareatleastoneofvn'slockobtainedfromthegraphcoloring.Wesayvn\propagate"alocktovi.Ifvihasmorethanoneincidentcrossingedges,thenitshouldinheritlocksfromallitsneighborsinGn.Subsequently,vipropagatesitslocksettoitsneighborsinGi.ThispropagationcontinuesuntileveryvertexinGiinheritslocksfromitsneighbors.Thisprocedurecanbesimplyimplementedasasetofbreath-rstsearches,witheachviatacrossingedgeasthesourcevertex.ThealgorithmisshowninFigure4.OnepropagationresultofourexampleisshowninFigure2(h).Animportantpropertyofthislockpropagationisthatitdoesnotintroduceanynewlock,thereforethenumberoflocksrequiredtocolorGicannotexceedthenumberoflocksrequiredtocolortheSNIGGsn.ThenallockassignmentresultisshowninFigure2(h).WerefertothenumberoflocksrequiredtocolorGastheHeuristicLockBound(HLB).WehavementionedinthenaivesolutionthattheupperboundUBoftherequiredlocksisthenumberofsharedmemorylocationsaccessedintheconcurrencygraphG.InsomecasesHLBmightexceedUB,andweneedtochoosethesmalleronefromHLBandUBforlockassignment.TheMLAheuristicalgorithmissummarizedinFigure4.ThefollowingtheoremsshowthatourMLAheuristiccanpreservethemutualexclusionbetweencriticalsectionswithoutanylossofconcurrency.TheyalsoshowthatlockassignmentonanarbitraryconcurrencygraphGisoptimalifthelockassignmentonSNIGofGisoptimal.Detailedproofscanbefoundin[5].Theorem1.WhenthealgorithmLockAssignment(G)terminates,anypairofinterferingcriticalsectionsinGshareatleastonecommonlock.Theorem2.WhenthealgorithmLockAssignment(G)terminates,
10 anypairofnon-interferingcriticalsections
anypairofnon-interferingcriticalsectionsdonotshareanylock.Theorem3.LockassignmentonaconcurrencygraphGisoptimalifandonlyifthelockassignmentonitsSNIGGsnisoptimal.TheconcurrencygraphpartitioningrunsinO(V+E)time,andthegraphcoloringrunsinO(V2)time.Attheworstcase,thetimecomplexityofHan-dleSerializingEdgessandLockPropagationareO(EV)andO(E2+VE), respectively.Therefore,attheworstcasethetotaltimecomplexityofLockAs-signmentisO(E2+VE).3.3ILPFormulationInthissection,weformulatetheMLAproblemasanILPproblem.Givenacon-currencygraphG=(V;E),weintroduce0-1variablesfu;itoindicatewhetherlockiisassignedtonodeuinG,1ujVj,and1ijMj,whereMisthesetofsharedmemorylocationsthatareaccessedinallcriticalsections.RecallthatthenumberoflocksgivenbyanoptimalsolutioncannotexceedjMj.Sinceeachcriticalsectionmustbeassignedatleastonelock,wehavethefollowingconstraint:fu;1+fu;2++fu;jMj1forallu2G(1)Weuse0-1variableslitoindicatewhetherlockiisassignedtoanycriticalsection,li=f1;i_f2;i__fjVj;i.Thisconditionisrepresentedbythefollowingconstraints:f1;i++fjVj;ili(2)f1;i++fjVj;ijVjli(3)Nextwederiveconditionsthatensurethelockassignmentiscorrectandmaxi-mizestheparallelism.Recallthatalockassignmentsolutioniscorrectifinter-feringcriticalsectionsuandvsharesomelock,andparallelismismaximizedifnon-interferingcriticalsectionsareassignedtwodisjointsetsoflocks.Let0-1variablesu;v;iindicatewhetheruandvsharelocki,thensu;v;i=fu;i^fv;i.Thisconditionisimposedbythefollowingconstraints:fu;i+fv;i2su;v;i(4)fu;i+fv;i2su;v;i+1(5)Weuse0-1variablesu;vtoindicatewhetheruandvshareanylock.Thensu;v=su;v;1__su;v;jMj.Thefollowingtwoconstraintsrepresentthiscondition:su;v;1++su;v;jMjsu;v(6)su;v;1++su;v;jMjjMjsu;v(7)Thensu;v=1forinterferingedge(u;v)(8)su;v=0fornon-interferingedge(u;v)(9)Thetotalnumberoflocksusedis:N=l1++ljMj(10)Therefore,theMLAproblemistominimizeNsubjecttoinequalities(1)to(9). 4ExperimentalResultsInthissection,wepresenttwosetsofexperimentstoevaluateourlockassign-mentalgorithm.Intherstsetofexperiments,wecomparet
11 heresultsproducedbyourMLAheuristicwithth
heresultsproducedbyourMLAheuristicwiththeoptimalsolutionsbasedontheILPformulationonasetof300randomconcurrencygraphs.InthesecondsetofexperimentweevaluatetheeectivenessoftheMLAheuristicusingSplash2[3]benchmarks.4.1PrecisionEvaluation Avg Min Max Vertices(V) 8.63 2 16 Edges(E) 16.73 1 53 EdgeDensityE=V2 0.19 0.09 0.28 Non-interferingedges(En) 3.37 0 20 En=E 0.22 0 1 Serializinginterferingedges 2.85 0 27 Table1.FeaturesofrandomconcurrencygraphsTostudytheprecisionofourMLAheuristicweimplementedourILPformu-lationinthecommercialILPsolverCPLEX,andtestedtheheuristicandtheILPformulationonasetof300randomlygeneratedconcurrencygraphswithcharacteristicsshowninTable1.Welimitedourrandomconcurrencygraphstocontainatmost16nodesduetotimeconstraintsintheILPsolver.Itshowsthatourheuristicsolutionisoptimalfor83.3%oftestedgraphs.Fortheremaining16.7%ofgraphsourheuristicassignsmorelocksthantheoptimalsolutions,andintheworstcaseatmosttwomorelocksthanoptimalsolutionsareassigned.Wealsoevaluatedthein\ruenceofnon-interferingsubgraphGnandserializinginterferingedgesEsforlockassignment.Forthispurpose,welistedthepreci-sionoftheMLAheuristicwiththeincreaseoftherelativesizeofnon-interferingsubgraph,givenbyVn=V,andwiththeincreaseoftherelativenumberofseri-alizinginterferingedges,givenbyEs=E,inFigure5(a)and(b),respectively.Asanexample,Figure5(a)showsthatourMLAheuristicgivesoptimalsolutionstoabout70%oftestcasesthathaveVn=V=0:6andsub-optimalsolutions(i.e.,assignextralocks)fortheremaining30%.Figure5(a)and(b)illustratethattheprecisionofourheuristicdependsonthenon-interferingsubgraphsizeandtherelativenumberofserializinginterferingedges.4.2PerformanceStudyonSun-FireNextwestudytheperformanceoftheMLAheuristicusingasetofSplash2[3]benchmarkslistedinTable2.Splash2benchmarkscallthePthreadsli- 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Vn / VMLA Optimal Un-optimal -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Es / EMLA Optimal Un-optimal (a) MLA Vn / V(b) MLA Es / E Fig.5.Precisionofthe
12 MLAheuristic Application Barnes Cholesky
MLAheuristic Application Barnes Cholesky Ocean-cont Radiosity Water-nsq Description N-body Matrix Hydro- 3-D Water factoring dynamics rendering molecules Problemsize 262144 tk29.O 514514 largeroom 512 bodies B8C256 batch molecules CSs 6 7 4 37 9 CStime(1proc) 6.29% 32.37% 0.11% 9.93% 11.54% Linesofcode 17.17/68 10.86/37 1.75/3 12.79/85 2.89/6 inCS(avg/max) FuncsinCS 1 1 0 10 1 Locksassigned 3 4 4 8 7 Locksforeach 1 1 1 4 1 CS(max) Table2.Benchmarksandlockassignmentresultsbrary6,andmutualexclusionisenforcedbypthread mutex lock(lock var)andpthread mutex unlock(lock var)functionswithexplicitlockvariables.Forthepurposeofourperformancestudy,wemanuallytransformedeachlock/unlockregionintoacriticalsection.Weconstructedtheconcurrencygraphforeachbenchmarkmanually,andappliedtheMLAalgorithmtocalculatethelockas-signment.ThenumberoflocksassignedtoeachbenchmarkisshowninTable2.WethenranthesetofbenchmarksonSunre10-processor750MHzmachine,andcollectedtwosetsofdataforeachbenchmarktoevaluateourheuristics:(1)Ts:theexecutiontimeofthebenchmarkwhenallcriticalsectionsarecon-trolledbyasinglelock,and(2)TMLA:theexecutiontimeofthebenchmarkwithlockassignmentusingourMLAheuristic.Figure6showstheperformanceimprovementofourlockassignmentwithrespecttothesinglegloballock,i.e.,(Ts TMLA)=Ts,runningondierentnumberofthreads.CholeskyandRadiosity 6TheoriginalSplash2benchmarksutilizetheArgonneNationalLaboratories(ANL)parmacsmacrosforparallelconstructs.Wehavere-conguredthemtocallthePthreadslibrary. Fig.6.Performanceimprovementwithrespecttosinglelockhaveshownaperformanceimprovementof30.17%and14.76%,respectively,duetothedecreaseoflockcontentionandserialization.Ontheotherhand,Barnes,Ocean-contandWater-nsqshowamuchlowerperformanceimprovementfortwomainreasons.First,theamountoftimespentoncriticalsectionsisasmallportionofthetotalexecutiontime.Forinstance,asshowninTable2,forOcean-cont,thetimespentoncriticalsectionstakesonly0.11%ofthetotalexecutiontime.Second,inBarnesandWater-nsqdataisoftenorganizedasarraysorcompl
13 icateduser-deneddatastructures,andisacc
icateduser-deneddatastructures,andisaccessedinadynamicpatternthatcannotbepredictedduringthecompilationtime.Whenweconstructedtheconcurrencygraphsweconservativelytreatedsucharraysanduser-deneddatastructuresasscalarunits.Thisconservativeapproachmayintroduce\spurious"interferenceamongcriticalsections,whichresultsinunnecessaryserialization.Theunnecessaryserializationwillthenincreaselockcontentionamongthreadsduringtheexecutiontime.Somemoresophisticatedanalysistechniquessuchasshapeanalysis[6],ordynamiccon\rictresolvingtechniquessuchastransactionalmemoryandsynchronizationstatebuer(SSB)[7]areneededtoexploitfurtherconcurrencyamongcriticalsectionsinthesebenchmarks.5RelatedWorkRecentlytherehasbeensomeworkoncompilerbasedlockinferencetechnique.Emmiet.al.[8]proposealockallocationproblemthattakesamultithreadedprogramannotatedwithatomicsectionsandinfersalockassignmenttoatomicsectionstopreserveitsatomicityanddeadlockfreedom.TheyformulatethelockallocationproblemasanILPproblemwhichminimizesthecon\rictcostbetweenatomicsectionsandminimizesthenumberoflocks.Noheuristicsolution ispresentedintheirwork.Ourlockassignmentdiersfromlockallocationinthefollowingtwoaspects.First,ourlockassignmentproblemmaximizestheparallelismamongcriticalsectionsusingtheminimumnumberoflocks,whilethelockallocationproblemusestheminimumnumberoflockstominimizethecon\rictcost,ametricthatisnotclearlyrelatedwiththeparallelism.Second,wepresentboththeheuristicsolutionandtheILPformulationforlockassignmentproblem.WeusetheILPformulationtoevaluatetheoptimalityofthelockassignmentheuristic.Wealsousescienticapplicationstoevaluatethelockassignmentheuristicandpresentperformanceimprovement.Hickset.al.[9]hasproposedalockinferencetechniquesforatomicsections,whichrstdeterminesasetofsharedmemorylocationsintheprogram,thenusesa\mutexinference"algorithmtoinferasetoflocksforeachatomicsectiontopreserveitsatomicity.Thebasicideaoftheirmutexinferencealgorithmistondthedependencerelationamongsharedmemorylocations,andpartitionthesharedmemorylocationsintoset
14 saccordingtothisdependencerelation.Locks
saccordingtothisdependencerelation.Locksarethenassignedtoeachmemorylocationset.Sincethemutexinfer-encealgorithmisnotoptimizationbased,itmayinfermorelocksthanourlockassignmentalgorithm.Autolocker[10]takestheprogramsannotatedwithpessimisticatomicsec-tionsandaprogrammercontrolledlockassignment,andinfersacompilercon-trolledlockassignmentthatisfreeofdeadlocksanddataraces.Vaziriet.al.[11]proposedadata-centricsynchronizationapproachforwrit-ingconcurrentprogramsusingatomicsets,whichareasetofsharedmemorylocationsthathave\similar"dataconsistencyproperties.Accessestoeldsinanatomicsetareassumedtotakeplaceatomicallyin\unitsofwork".Takenaprogramwithannotatedatomicsets,thecompilerinfersunitsofworkauto-maticallyandtranslatesthemintosynchronizedblocks.OurworkcomplementsVaziriet.al.'sworkinthatwecananalyzeanddeterminetheatomicsetsandunitsofworkusingconcurrencyanalysisandlockassignmentalgorithm.Someotheroptimizationtechniquesonlockshavebeenreported.DinizandRinard[12]presentdatalockcoarseningandcomputationlockcoarseningtech-niquestoreducetheoverheadofne-grainlocksinJavaprograms.Choiet.al.[13]andAldrichet.al.[14]removeunnecessarysynchronizationfromJavaprograms.6ConclusionsInthispaperweproposedalockassignmenttechniquetosimplifythemu-tualexclusioninmultithreadedprograms.Ittakestheprogramsannotatedwithcriticalsectionsandndstheminimumnumberoflocksneededtoenforcemutualexclusionamonginterferingcriticalsectionswithoutanylossofconcur-rency.Experimentalresultsareveryencouragingandshowthatourmethodcanbeusedtoimprovetheperformanceofmultithreadedprogramswithmutualex-clusionbyexploitingconcurrencyamongmultiplecriticalsections.Anextensionofthisworktosupportread/writelocksisasubjectforfuturework. References1.J.GrayandA.Reuter.TransactionProcessing:ConceptsandTechniques.MorganKaufmann,1993.2.V.Sreedhar,Y.Zhang,andG.Gao.Anewframeworkforanalysisandoptimizationofsharedmemoryparallelprograms.TechnicalReportCAPSL-TM-063,UniversityofDelaware,Newark,DE,2005.3.TheStanfordFLASHPrjoect.Stanfordparallelapplicationsforsharedm
15 emory(SPLASH)benchmark.Inhttp://www-\ras
emory(SPLASH)benchmark.Inhttp://www-\rash.stanford.edu/apps/SPLASH/.4.P.Briggs.RegisterAllocationviaGraphColoring.PhDthesis,RiceUniversity,1992.5.Y.Zhang,V.Sreedhar,W.Zhu,V.Sarkar,andG.Gao.Optimizedlockassign-mentandallocation:Amethodforexploitingconcurrencyamongcriticalsections.TechnicalReportCAPSL-TM-065-revised,UniversityofDelaware,Newark,DE,2007.6.RakeshGhiyaandLaurieJ.Hendren.Isitatree,adag,oracyclicgraph?ashapeanalysisforheap-directedpointersinc.InPOPL'96:Proceedingsofthe23rdACMSIGPLAN-SIGACTsymposiumonPrinciplesofprogramminglanguages,pages1{15,1996.7.WeirongZhu,VugranamCSreedhar,ZiangHu,andGuangR.Gao.Synchro-nizationstatebuer:supportingecientne-grainsynchronizationonmany-corearchitectures.InISCA'07:Proceedingsofthe34thannualinternationalsympo-siumonComputerarchitecture,pages35{45,2007.8.MichaelEmmi,JereyS.Fischer,RanjitJhala,andRupakMajumdar.Lockallocation.InPOPL'07:Proceedingsofthe34thannualACMSIGPLAN-SIGACTsymposiumonPrinciplesofprogramminglanguages,pages291{296,2007.9.M.Hicks,J.Foster,andP.Pratikakis.Lockinferenceforatomicsections.InTRANSACT'06:Proceedingsofthe1stACMSIGPLANWorkshoponLanguages,Compilers,andHardwareSupportforTransactionalComputing,2006.10.BillMcCloskey,FengZhou,DavidGay,andEricBrewer.Autolocker:synchro-nizationinferenceforatomicsections.InPOPL'06:Conferencerecordofthe33rdACMSIGPLAN-SIGACTsymposiumonPrinciplesofprogramminglanguages,pages346{358,2006.11.M.Vaziri,F.Tip,andJ.Dolby.Associatingsynchronizationconstraintswithdatainanobject-orientedlanguage.InPOPL'06,pages334{345.ACM,2006.12.P.DinizandM.Rinard.Lockcoarsening:Eliminatinglockoverheadinautomati-callyparallelizedobject-basedprograms.InLCPC'96,pages284{299,1996.13.Jong-DeokChoi,ManishGupta,MauricioJ.Serrano,VugranamC.Sreedhar,andSamuelP.Midki.Stackallocationandsynchronizationoptimizationsforjavausingescapeanalysis.ACMTrans.Program.Lang.Syst.,25(6):876{910,2003.14.J.Aldrich,E.Sirer,C.Chambers,andS.Eggers.Comprehensivesynchronizationeliminationforjava.ScienceofComputerProgramming,47(2-3):91{120,2003.