/
A Cache Design for Probabilistically Analysable Realti A Cache Design for Probabilistically Analysable Realti

A Cache Design for Probabilistically Analysable Realti - PDF document

alexa-scheidler
alexa-scheidler . @alexa-scheidler
Follow
408 views
Uploaded On 2015-04-28

A Cache Design for Probabilistically Analysable Realti - PPT Presentation

Cazorla Universitat Polit ecnica de Catalunya Barcelona Supercomputing Center Spanish National Research Council IIIACSIC Abstract Caches provide signi64257cant performance improve ments though their use in realtime industry is low because current WC ID: 56036

Cazorla Universitat Polit ecnica

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "A Cache Design for Probabilistically Ana..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Measurement-basedanalysistechniquesperformextensivetest-ingontherealsystemunderanalysisusingstressful,high-coverageinputdata,recordingthelongestobservedexecutiontimeandaddingtoitanengineeringmargintomakesafetyallowancesfortheunknown.However,determiningtheengi-neeringmarginisextremelydifcult—ifatallpossible—especiallywhenthesystemmayexhibitdiscontinuouschangesintimingduetounanticipatedtimingbehaviour.PTAhasemergedasanalternativetocurrenttiminganalysistechniques.BothSPTA[6]andMBPTA[7]provideacumu-lativedistributionfunction,orpWCETfunction,thatupper-boundstheexecutiontimeoftheprogramunderanalysis,guaranteeingthattheexecutiontimeofaprogramonlyexceedsthecorrespondingexecutiontimeboundwithaprobabilitylowerthanagiventargetprobability(e.g.,10�15).Theprobabilistictimingbehaviourofaprogram(oraninstruction)canberep-resentedwithExecutionTimeProles(ETPs).AnETPdenesthedifferentexecutiontimesofaprogram(orlatenciesofaninstruction)anditsassociatedprobabilities.Thatis,thetimingbehaviourofaprogram/instructioncanbedenedbythepairofvectors(!l;!p)=fl1;l2;:::;lkgfp1;p2;:::;pkg,wherepiistheprobabilitytheprogram/instructiontakinglatencyli.TheETPforaprogram(orinstruction)maydifferfordifferentinputsetsleadingtodifferentexecutionpaths.EachPTAtechniquehasitsownmethodstocombineresultsfromdifferentexecutionpaths.Wereferthereadertothosemethodsforfurtherdetails[6][7].A.RequirementsofSPTAandMBPTAonCacheDesignPTAtechniquesrequirethattheeventsunderanalysis,pro-gramexecutiontimesforMBPTAandinstructionlatenciesforSPTA,canbemodelledwithi.i.d.randomvariables[6]:tworandomvariablesaresaidtobeindependentiftheydescribetwoeventssuchthattheoccurrenceofoneeventdoesnothaveanyimpactontheoccurrenceoftheotherevent.Tworandomvariablesaresaidtobeidenticallydistributediftheyhavethesameprobabilitydistribution.TheexistenceofanETPensuresthateachpotentialexecutiontimeoftheprogram(forMBPTA)orinstruction(forSPTA)haveanactualprobabilityofoccurrence,whichisasufcientandnecessaryconditiontoachievethedesiredprobabilistici.i.d.executiontimebehaviour[6].AdifferencebetweenSPTAandMBPTA,besidesthelevelofabstractionatwhichETPsaretobeconstructed,isthatwhileSPTArequiresETPsforeachinstructiontobedetermined,MBPTAsimplyneedsthoseETPsfortheprogramtoexist,butnottobeknown.RegardlessofwhetherETPsareobtainedforinstructionsorfullprograms,theycannotbederivedwithcurrentdetermin-isticarchitecturessinceeventsaffectingexecutiontime,e.g.cachehits/misses,onthosearchitecturescannotbeattachedaprobabilityofoccurrence.Atthecachelevel,theproblemresidesonthedeterministicbehaviouroftheplacementandreplacementpolicies,which(1)leadtocachelayoutsforwhichthecorrespondingexecutiontimescannotbemodelledwithi.i.d.randomvariablespreventingtheuseofMBPTAand(2)eachmemoryrequestdoesnothaveanactualprobabilityofhit/misspreventingitsusewithSPTA.Overall,aSPTA-andMBPTA-analysablecachemustprovidethefollowingproperties:a)SPTA:SPTArequiresthei.i.d.hypothesistostrictlyholdatthegranularitylevelatwhichETParebuilt,i.e.instructions.IfthetimingprobabilitydistributioncapturedbytheETPoftheinstructionisfullyindependentoftheexecutionhistory,theETPoftheinstructionwouldholdconstantacrossallexecutionsoftheinstruction.However,thisisunaffordableathardwarelevel[6].Instead,SPTA[6]alsoworkswithaSPTA-imperfectapproach.InsuchapproachthetimingvectoroftheETPisinsensitivetoexecutionhistorybuttheprobabilityvectorisnot,andtherefore,thereisaneedforboundingprobabilis-ticallythisdependence.ThisPTA-imperfectapproachprovidessafepWCETestimatesandistheoneusedinthispaper.HenceSPTArequiresthat:1)Eachmemoryaccesshasahit-missprobability,and2)Incasememoryinstructionsaredependent,thatdependencemustbeprobabilisticallymodelable.b)MBPTA:Theobservedexecutiontimesfullthei.i.d.propertyifobservationsareindependentacrossdifferentrunsandaprobabilitycanbeattachedtoeachpotentialexecutiontime.Tothatend,itisenoughifwemaketheeventsthatmayaffecttheexecutiontimeofaprogramrandomevents.Hence,takingmeasurementsfromaprogramisequivalenttorollingadice,witheachfacehavingaprobabilityofappearance.MakingenoughrollsisenoughtoapplyMBPTA,whichderivesupper-boundsoftheexecutiontimedistributionbymeansofExtremeValueTheory(EVT)[11][7].NotethattheexistenceoftheETPsforeachinstructionensuresthattheexecutiontimesareprobabilisticandthereforeMBPTAcanbeapplied.AsforSPTA,memoryinstructionsmayhavedependences,butitisenoughthatthosedependencesareprobabilistic,sothatthemeasurements(executiontimes)obtainedbyrunningtheprogramprobabilisticallycapturetheeffectofsuchdependence.III.TIMINGBEHAVIOUROFRANDOMCACHESThispapershowsthatrandomisingthereplacementandplacementpoliciesallowsconstructingETPsformemoryin-structions:(!l;!p)=flhit;lmissgfphit;pmissg,wherelhitandlmissarethelatencyofhitandmissrespectivelyandphitandpmisstheassociatedprobabilityineachcase.Inparticular,inthissection,weshowthatphitandpmisscanbecomputedanalyticallybasedonthepropertiesofRRandourrandomplacement(RP)policy.AspointedoutinSectionII,theexistenceoftheETPsensuresthattheexecutiontimesareprobabilisticandthereforethesystemfullsthei.i.d.properties.A.RandomReplacement(RR)RRpolicyensuresthateverytimeamemoryrequestmissesincache,awayinitscorrespondingcachesetisrandomlyselectedandevictedtomakeroomforthenewcacheline.Thisensuresthat(1)thereisindependenceacrossevictionsand(2)theprobabilityofacachelinetobeevictedisthesameacrossevictions,i.e.foraW-wayassociativecache,theprobabilityforanyparticularcachelinetobeevictedis1 Wforeachset.Intheparticularcaseofafully-associative(FA)cache,suchprobabilityholdsfortheonlycacheset.Givenasequenceofcacheaccesses,theETPforeachofthem(i.e.itshit/missprobabilities)canbedeterminedbycomputinghowlikelypreviousaccessescanevictthecorrespondingcacheline.Forinstance,inthesequenceA,B,C,A&#x]TJ/;ྂ ; .96;& T; 7.;݉ ;� Td;&#x [00;,BandCcanevictAwithagivenprobabilitythatdependsonthenumberofcachewaysandwhetherBandCwerefetchedbeforeornot.ThefactthatthoseprobabilitiesexistandcanbecomputedisenoughforPTAtechniques.Sincecachelinesevictedarechosenrandomly,whetheranaccesshitsormissesdependssolelyonrandomeventsforagivensequenceofaccessesregardlessoftheirabsoluteaddresses,andthushit/missoutcomeistruly Fig.2.ParametrichashfunctionproposedfortheRPcache.PmissA(SA[S;W])=PmissA(DM[S])PmissA(FA[W])(3)wherePmissA(SA[S;W])standsforthemissprobabilityofAinaSAcachewithSsetsandWways.Analogously,PmissA(DM[S])andPmissA(FA[W])standforthemissprob-abilitiesinDMandFAcacheswithSsetsandWwaysrespectively.HitprobabilitiesareobtainedasPhit=1�Pmiss.Insummary,hit/missprobabilitiesexistforallaccesses,andsotheirETPs.Asaconsequence,executiontimeswillbei.i.d.andPTAcanbesafelyappliedontopofaSAcache.IV.HARDWAREDESIGNOFARANDOMCACHEThissectiondescribeshowtoimplementbothrandomplace-mentandreplacementpolicies.A.RandomReplacementRandomreplacementpolicieshavebeenextensivelyusedinvariousprocessorarchitectures,bothinthethehigh-performanceandembeddedmarkets.ExamplesforthelattermarketaretheAeroexGaislerNGMP[2]orsomeprocessorsoftheARMfamily[3].Themostrelevantelementofarandomreplacementpolicyisthehardwaregeneratingrandomnumberswhichselectsthewaytobeevictedonamiss.Ingeneral,pseudo-randomnumbergenerators(PRNG)areimplemented.TheparticularPRNGwehaveusedinthispaperistheMultiply-With-Carry(MWC)[14]PRNG,sincewehavetestedthat(i)itgeneratesnumberswithasufcientlyhighlevelorrandomness,(ii)itsperiodishuge,and(iii)itcanbeefcientlyimplementedinhardware.GiventhatefcientimplementationsofaPRNGexistandspaceislimited,weomitthedetailsofourimplemen-tationoftheMWCPRNG.B.RandomPlacementPolicyInthissection,weproposeanimplementationofarandomplacementpolicy.Thekeycomponentsofthisdesignare(1)alow-costPRNGiftheRIIisproducedbyhardwareand(2)aparametrichashfunction.Inordertokeepcachelatencyandenergylow,theimple-mentationofbothcomponentsmustbekeptsimple.Moreover,bothcomponentsareplaced`infront'ofthecache,sothecachedesignisnotchangedperse,seeFigure1(b),butsomeextralogicisaddedbeforeaccessingcache.Asforrandomreplacement,weusetheMWCPRNGiftheRIIisproducedbyhardware.TheParametricHashFunctionisusedtorandomisethecacheplacement.Figure2showsourimplementationoftheparametricplacementfunction.Thehashfunctionhastwoinputs,thebitsoftheaddressusedtoaccesstheset(indexbits),`@'inthegure,andaRII.Inthecongurationoftheparticularexample,32bytespercachelineand32-bitaddressesareassumed.Therefore,the5lowermostbitsarediscarded(offsetbit)andonly27bitsareused.Thehashfunctionrotatestheaddressbits,basedonsomebitsoftheRIIasitisshowninthetworightmostrotateblocksofthegure.Bydoingthis,weensurethatwhenadifferentRIIisused,themappingofthataddresschanges.Analogously,theaddressbitsarerotatedbasedonsomebitsoftheaddressitself.Thisoperation,whichisperformedbythetwoleftmostrotateblocks,changesthewaythattheaddressesareshifted.Notethataddressesarepaddedwithzerostoobtainapower-of-twonumberofbits,soaddressbitscanberotatedwithoutanyconstraint.Otherwise,rotationvaluesbetween27and31wouldrequirespecialtreatment.Finally,allbitsoftherotatedaddresses,theoriginaladdressandtheRII(187bitsintheexample),areXORedsuccessively,untilweobtainthedesirednumberofbitsforindexingthecachesets.Forexample,a16KBcachewith32bytesperlinewouldneed9indexbitsforadirect-mappedorganisation,8bitsfora2-wayset-associative,andsoonandsoforth.Hence,5XORgatelevelsareenoughtoproducetheindex.AsshowninFigure2,thehardwareimplementationofthehashfunctionconsistsof4rotateblocksand5levelsof2-inputXORgates.Eachrotateblockcanbeimplementedwitha5-levelmultiplexer[19].Sincethelatencyandtheenergyperaccessofafully-associativecacheismuchlargerthantheoneofdirect-mappedorset-associativecaches,therelativeoverheadofthehashfunctionissmall.WehavecorroboratedthisobservationbyintegratingourparametricplacementfunctionintotheCACTItool[15].Resultsforseveralcachecongurationsshowthatenergyperaccessgrowsaround3%anddelaygrowsby40%(itisstilllessthanhalfthedelayofafully-associativecache).NotethathitlatencyhaslowimpactinWCETsinceitistypicallysomeordersofmagnitudelowerthanmisslatency.Nevertheless,weassumethesamehitlatencyforourDMandSAcongurations,andtheFAone,whichplaysagainstourproposal.V.RESULTSA.ExperimentalSetupWeuseacycle-accurateexecution-drivensimulatorbasedontheSoCLibsimulationframework[21],withPowerPCbina-ries[23].Thesimulatormodelsa4-stagepipelinedprocessorwithamemoryhierarchycomposedofrstlevelseparatedinstructionanddatacaches,andmainmemory.Bothinstruc-tionanddatacachesizeis4-KBwith16-bytelinesize.Associativitiesconsideredare1-way(direct-mapped),8-way(set-associative),32-way(set-associative)and256-way(fully-associative).Bothcachesimplementrandomreplacementandourrandomplacementpolicy.Thelatencyofthefetchstagedependsonwhethertheaccesshitsormissesintheinstructioncache:ahithas1-cyclelatencyandamisshas100-cyclelatency.Afterthedecodestage,memoryoperationsaccessthedatacachesotheycanlast1or100cyclesdependingonwhethertheymissornot.Theremainingoperationshaveaxedexecutionlatency(e.g.integeradditionstake1cycle).WeusetheEEMBCAutobenchbenchmarksuite[16]thatreectsthecurrentreal-worlddemandofsomeautomotivecriticalreal-timeembeddedsystems.B.Fulllingthei.i.dpropertiesTheuseofrandomreplacementandplacementpoliciesguaranteesthatobservedexecutiontimesfulltheproper-tiesrequiredbyMBPTA.However,wefurtherverifythis Fig.3.EVTprojectionforaifftrTABLEIIIPWCETINCREMENTOFTHESAANDDMCACHESWITHRESPECTTOTHEFAONE,CONSIDERINGANEXCEEDANCEPROBABILITYOF10�13 Benchmarks 32w-8s(SA) 8w-32s(SA) 1w-256s(DM) a2time 452% 511% 1758% aifftr 9% 11% 468% airf 61% 65% 1418% aiifft 9% 12% 653% cacheb 9% 12% 904% canrdr 8% 9% 2126% iirt 370% 478% 1448% puwmod 29% 31% 855% rspeed 23% 25% 12691% tblook 167% 185% 1995% ttsprk 13% 14% 3386% onset-associativecaches[3][2].Randomisedcachesinhigh-performanceprocessorshavebeenproposedtoremovecacheconictsbyusingpseudo-randomhashfunctions[22][10][20].However,thebehaviourofallthosecachedesignsisfullyde-terministic,andtherefore,wheneveragiveninputsetproducesapathologicalaccesspattern,itwillhappensystematicallyforsuchinputset.Therefore,althoughthefrequencyofpathologicalcasesisreduced,theycanstillappearsystematicallybecausethereisnowaytoprovethattheirprobabilityisbound.SomeworkonPTAhasbeendonebasedontheassumptionthatexecutiontimesaretrulyi.i.d.andthatfrequenciesforexecutionpathsprovidedbytheusermatchactualprobabilitiesofthosepaths[8].LaterworkhasshownhowtoperformPTAwithnoassumptionontheprobabilitiesofexecutionpathsandhowtouserandomcachesinPTAsystems[6][7].Concretely,authorsshowedthatrandomisedreplacementeffectivelyavoidspathologicalbehaviourofdeterministicreplacementpolicieswhileachievingreasonableperformance.SomeauthorshavetriedtoperformPTAontopofconventionalcachedesigns[13].Unfortunately,thiscanonlybedoneiftheuserisabletoprovidethetrueprobability(notthefrequency)ofeachcachelayoutandeachexecutionpathtooccurforallinstancesofthesystemdeployed,whichis,ingeneral,unattainable.Tothebestofourknowledge,ourpaperistherstenablingtheuseofthemostcommonandefcientcachedesigns,i.e.set-associativeanddirect-mappedcachesinprobabilisticallyanalysablehardreal-timesystemswhilepreservingtheprop-ertiesneededbysoundPTAtechniques[6][7].VII.CONCLUSIONSANDFUTUREWORKPTAenablesaffordableanalysisofcomplexhardwareinsafety-criticalreal-timesystemsbyreducingtheamountofinformationaboutthehardwareandsoftwarestaterequiredtoprovidetrustworthyWCETestimates.Yet,PTAreliesonsomepropertiesthatexistinghardwarefailstoprovide.InparticularPTArequiresthattheexecutiontimesoftheprogramonthetargetplatformcanbemodelledwithi.i.drandomvariables.Inthecaseofthecache,thedeterministicbehaviourofplacementandreplacementpoliciesmakesitimpossibletoassignatrueprobabilitytodifferentexecutiontimes.Onlyunaffordablefully-associativecacheswithrandomreplacementwouldallowderivingtrueprobabilities.Thispaperpresentstherstrandomplacementpolicybasedonaparametrichashfunctionsothati.i.d.executiontimesareachieved,thusenablingtheuseofefcientset-associativeanddirect-mappedcachesinthecontextofprobabilistictiminganalysis.Wefurthershowthatourcachedesigncanbeimplementedwithlittleoverheadintermsofcomplexity,energyandperformance.Whileinthispaperwehavefocusedondevisingrandomplacementandreplacementpoliciesandimplementationsforrstlevelcaches,weplantoextendrandomplacementpoliciestoothercomponentssuchassecondlevelcachesandtranslationlook-asidebuffers(TLBs).ACKNOWLEDGMENTSThisworkhasbeensupportedbythePROARTISFP7Euro-peanProjectundergrantagreementnumber249100,theSpanishMinistryofScienceandInnovationundergrantTIN2012-34557andtheHiPEACNetworkofExcellence.LeonidasKosmidisisalsofundedbytheSpanishMinistryofEducationundertheFPUgrantAP2010-4208.EduardoQui˜nonesisalsofundedbytheSpanishMinistryofScienceandInnovationundertheJuandelaCiervagrantJCI2009-05455.REFERENCES[1]Guidelinesandmethodsforconductingthesafetyassessmentprocessoncivilairbornesystemsandequipment.ARP4761,2001.[2]AeroexGaisler.QuadCoreLEON4SPARCV8Processor-LEON4-NGMP-DRAFT-DataSheetandUsersManual,2011.[3]ARM.Cortex-R4andCortex-R4FTechnicalReferenceManual,2006.[4]SarahBoslaughandPaulAndrewWatters.Statisticsinanutshell.O'ReillyMedia,Inc.,2008.[5]J.V.Bradley.Distribution-FreeStatisticalTests.Prentice-Hall,1968.[6]F.J.Cazorla,E.Quinones,T.Vardanega,L.Cucu,B.Triquet,G.Bernat,E.Berger,J.Abella,F.Wartel,M.Houston,L.Santinelli,L.Kosmidis,C.Lo,andD.Maxim.Proartis:Probabilisticallyanalysablereal-timesystems.TechnicalReport7869(http://hal.inria.fr/hal-00663329),INRIA,toappearinACMTECS,2012.[7]L.Cucu,L.Santinelli,M.Houston,C.Lo,T.Vardanega,L.Kosmidis,J.Abella,E.Mezzeti,E.Quinones,andF.J.Cazorla.Measurement-basedprobabilistictiminganalysisformulti-pathprograms.InECRTS,2012.[8]LaurentDavidandIsabellePuaut.Staticdeterminationofprobabilisticexecutiontimes.InECRTS,2004.[9]R.Wilhelmetal.Theworst-caseexecutiontimeproblem:overviewofmethodsandsurveyoftools.Trans.onEmbeddedComputingSystems,7(3):1–53,2008.[10]A.Gonz´alezetal.EliminatingcacheconictmissesthroughXOR-basedplacementfunctions.InICS,1997.[11]SamuelKotzandSaraleesNadarajah.Extremevaluedistributions:theoryandapplications.WorldScientic,2000.[12]BenjaminLesage,DamienHardy,andIsabellePuaut.Wcetanalysisofmulti-levelset-associativedatacaches.WCETWorkshop,2009.[13]YunLiangandTulikaMitra.Cachemodelinginprobabilisticexecutiontimeanalysis.InDAC,2008.[14]G.MarsagliaandA.Zaman.Anewclassofrandomnumbergenerators.AnnalsofAppliedProbability,1(3):462–480,1991.[15]N.Muralimanohar,R.Balasubramonian,andN.P.Jouppi.CACTI6.0:Atooltounderstandlargecaches.HPTechReportHPL-2009-85,2009.[16]JasonPoovey.CharacterizationoftheEEMBCBenchmarkSuite.NorthCarolinaStateUniversity,2007.[17]I.PuautandD.Decotigny.Low-complexityalgorithmsforstaticcachelockinginmultitaskinghardreal-timesystems.InRTSS,2002.[18]J.Reinekeetal.Timingpredictabilityofcachereplacementpolicies.Real-TimeSystems,37:99–122,November2007.[19]S.Huntzickeretal.Energy-delaytradeoffsin32-bitstaticshifterdesigns.InICCD,2008.[20]A.SeznecandF.Bodin.Skewed-associativecaches.InPARLE.1993.[21]SoCLib.-,2003-2012.http://www.soclib.fr/trac/dev.[22]NigelTophamandAntonioGonz´alez.Randomizedcacheplacementforeliminatingconicts.IEEETrans.Comput.,48,February1999.[23]J.Wetzel,E.Silha,C.May,B.Frey,J.Furukawa,andG.Frazier.PowerPCUserInstructionSetArchitecture.IBMCorporation,2005.