/
A Coldness Metric for Cache Optimization Raj Parihar Chen Ding Michael C A Coldness Metric for Cache Optimization Raj Parihar Chen Ding Michael C

A Coldness Metric for Cache Optimization Raj Parihar Chen Ding Michael C - PDF document

min-jolicoeur
min-jolicoeur . @min-jolicoeur
Follow
454 views
Uploaded On 2015-03-07

A Coldness Metric for Cache Optimization Raj Parihar Chen Ding Michael C - PPT Presentation

Huang Dept of Electrical Computer Engineering Dept of Computer Science University of Rochester Rochester NY 14627 USA pariharece cdingcs michaelhuang rochesteredu Abstract A hot concept in program optimization is hotness For example program optimiz ID: 42415

Huang Dept Electrical

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "A Coldness Metric for Cache Optimization..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Astheoptimizationlevelincreases,thecoldnessagainde-creases.Forthe4MBcache,wemustoptimizetheaccesstoatleast344KB,2.4MB,and5.4MBdatatoreducethemissratioby10%,50%,and90%respectively.Inthelastcase,thecoldnessmet-ricshowsthatitisnecessarytooptimizeforadatasizemorethanthecachesizetoobtaintheneededreduction.Nextwedescribetheexperimentsthatproducedthesecoldnessdata.OursimulationframeworkisbasedonSimpleScalarandwemodelaPOWER7likemicroarchitecture.Onanaverage,wefastforwardeachapplicationsabout5billioninstructionsbeforewebegincollectingthestatisticsfora200millioninstructionwindow.DatacachesinthisstudyarefullyassociativewithLRUreplace-mentpolicy.Thisensuresthatallthemissesarecapacityandcom-pulsorymissesofthedatawhosereusedistanceislargerthanthecachesizeandtherearenoconictmisses.InFigure2,wepresenttheminimalnumberofdistinctaddressesthataccountforagivenpercentageofcachemisses.Thecoldnessmetricisthenegationofthisnumber.Theaveragenumberofmostmissedaddressesin-creasebyabout100xfortop10%and50%missesasthecachesizeincreases. (a)Distinctaddressesaccountingfortop10%misses (b)Distinctaddressesaccountingfortop50%missesFigure2:Distinctaddressesaccountingfortop10%and50%missescorrespondingtovariousreusedistance.Basedontheindividualresults,weclassifyapplicationsintotwogroups.Applicationsthatareconsistentlycolderthantheme-dianareknownasbelowmediancoldandapplicationsthatarecon-sistentlynotascoldasthemedianareknownasabovemediancoldapplications.Thenexttableshowsthetwocoldnesscategories. Temperaturezones SPEC2006Applications abovemedian h264ref,sphinx3,astar,xalancbmk, (lesswidespreadmisses) gobmk,hmmer,dealII,namd belowmedian lbm,bwaves,libquantum,perlbench, (morewidespreadmisses) zeusmp,gromacs,mcf,soplex,sjeng 3.DiscussionandFutureWorkThenewmetricaugmentsthegrowingsetofdata-centricmetrics.Reusedistancehasbeenusedtoshowthetemporalandspatiallo-calityforindividualorcollectionofdata.Hotdatastreamsshowedtheregularityinconsecutivedataloads[5].Arecenttool,HPC-Toolkit,showsmostmisseddatatohelpprogramtuning[8].Unlikecoldness,theothermetricsdonotquantifytheminimalnumberofdistinctdataaddressesthatmustbetargetedbyanoptimization.Asfuturework,weplantocalibratethecoldnessmetricmorethoroughlyusingthefullprogramtraceandmeasuringtheeffectofprograminputandcacheassociativity.Wewillstudyhardwaresolutions.FromFigure2,itisevidentthattherearenumerousdistinctaddresseswhichaccountfortopmisses.Thesenumberofaddressesincreaserapidly(showninlog-scale)asthecachesizeincreases.Fromasimilarstudy,wealsoobservedthatthesemissesareincurredbyalargenumberofdis-tinctstaticinstructions,notjustafewdelinquentinstructions.Apossiblesolutionismoreeffectiveprefetching.Aspecicimple-mentationoflook-ahead,whichwecalldecoupledlook-ahead[4],isabletoreducetheprimarymissesby88xandsecondarymissesby38xforacachesizeof4MBthatincursabout100,000distinctmissaddressesintop90%misses.Wewillstudythenewmetricasaguidetoprogramoptimiza-tion.TheprogrammcfinFigure2showstwodistinguishingchar-acteristics.First,itisoneoftheprogramsthatarebelowmediancold,whichmeansthatitsmissesaremorewidespread.Second,for50%miss-ratioreduction,itscoldnessisamongtheleastvary-ingacrossdifferentcachesizes.Aneffectiveoptimizationhasbeenfoundformcf,upto35%improvementbystructuresplitting[7].Thecoldnesscharacteristicssuggestthatstructuresplittingisef-fectiveinremovingwidespreadmissesbutmaybeapplicableonlyforcertaintypeofprograms.Wewillalsousethecoldnessmetricprogramtuningtoestimatethedifcultyandsuggestdatatargets.Insummary,wehopetoidentifyeffectivetechniquestodealwithcolddata,andgeneralizeandimprovethesetechniques.Aswefaceincreasinglyseverememoryproblems,wemustunderstandandexpandthewaystooptimizeforextremelycolddata.References[1]D.Callahan,J.Cocke,andK.Kennedy.Estimatinginterlockandimprovingbalanceforpipelinedarchitectures.JPDC,5(4),1988.[2]D.CallahanandJ.Gray.Designconsiderationsforpar-allelprogramming,2008.http://msdn.microsoft.com/en-us/magazine/cc872852.aspx.[3]J.F.CantinandM.D.Hill.CacheperformanceforSPECCPU2000benchmarks.http://www.cs.wisc.edu/multifacet/misc/spec2000cache-data.[4]A.GargandM.Huang.APerformance-CorrectnessExplicitlyDecou-pledArchitectureInProc.Int'lSymp.onMicroarch.,pages306–317,November2008.[5]T.M.ChilimbiandM.Hirzel.Dynamichotdatastreamprefetchingforgeneral-purposeprograms.InPLDI,pages199–209,2002.[6]C.DingandK.Kennedy.Thememorybandwidthbottleneckanditsameliorationbyacompiler.InIPDPS,pages181–190,2000.[7]G.ChakrabartiandF.Chow.StructurelayoutoptimizationsintheOpen64compiler.InOpen64Workshop,2008.[8]X.LiuandJ.M.Mellor-Crummey.Pinpointingdatalocalityproblemsusingdata-centricanalysis.InCGO,pages171–180,2011.