WorkingSetbasedPhysicalMemoryBallooningJuiHaoChiangStonyBrookUniversityHanLinLiandTzickerChiuehIndustrialTechnologyResearchInstitute 9610th International Conference on Autonomic Computing ICAC ID: 317117
Download Pdf The PPT/PDF document "USENIX Association 10th International Co..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
USENIX Association 10th International Conference on Autonomic Computing (ICAC 13)95 WorkingSet-basedPhysicalMemoryBallooningJui-HaoChiangStonyBrookUniversityHan-LinLiandTzi-ckerChiuehIndustrialTechnologyResearchInstitute 9610th International Conference on Autonomic Computing (ICAC 13)USENIX Association pagesthathavebeenaccessedatleastonceintheobser-vationwindow.However,thisschemeisinfeasiblebe-causetheoverheadoftrappingeverymemoryread/writeissimplytooprohibitivetobeacceptableinpractice.Togetaroundthisproblem,VMwaresESXusedasam-plingapproachtoestimatingtheworkingsetsizeofaVM.PeriodicallyitmarksarandomlysampledsubsetoftheVMsguestphysicalpagesasinvalid,countsthenumberofpagesinthesubsetthatareaccessedwhen-everaprotectionfaultagainstanyofthesepagesoccurs,andusestheresultingcounttoinfertheVMsworkingsetsize.AnotherwaytoestimateaVMsworkingsetsize,usedbythemechanism[15]intheXenhypervisor,istodirectlyusethe ticmaintainedbytheLinuxkernel,whichcorrespondstothetotalnumberofmemorypagescon-sumedbyallprocessesonaVM.Forpagereclamation,LinuxmaintainstwoLRU(LeastRecentlyUsed)lists,,foreachofthefollowingtwotypesofmemorypages:(1)AnonymousMemory,whichcor-respondstotheheapsandstacksofuserprocesses,andPageCache,whichcorrespondstothekernelsmem-orytobufferandcachethepayloadsofdiskreadsandUtilizingthehardwarereferencebit,LinuxputspagesthatareaccessedmorefrequentlyintoActivelistandleavepagesthatareaccessedlessfrequentlyinInactivelist.ThepagereclamationmechanismtraversestheIn-activelisttofreeitspagesandpossiblyre-allocatethem.Ifareclaimedpagebelongstoanonymousmemory,thekernelmarksthepagespagetableentryasnon-present,andswapsoutthepagescontenttotheswapdisk.Whenthepageislateraccessed,aeventoccursanditisswappedin.Ifareclaimedpagebelongstopagecache,thekernelushesitscontenttodiskifithasbeendirtied.Ifthepageislateraccessed,arefaulteventoccursanditisbroughtbackin.WhenaVMsphysicalmemoryallocationislargerthanorequaltoitsworkingsetsize,thenumberofswapinandrefaulteventsshouldbeclosetozero.ThisobservationinspiresthethirdwaytoestimateaVMsworkingsetsize:Graduallydecreasingtheballoontar-getoftheballoondriverintheVMuntiltheVMsswapinandrefaultcountsstarttobecomenon-zero.TheamountofphysicalmemoryallocatedtotheVMatthatinstantistheVMsworkingsetsize.Moreconcretely,a3-statenitestatemachine,asshowninFigure1,isusedtoadaptivelytrackaVMsworkingsetsize(WSS).Any-timetheWSSchanges,weadjusttheVMsballoontar-getaccordingly.Thenite-statemachinestartsintheFASTstateandinitializestheVMsWSStotheVMs .WhileintheFASTstate,thenite-statemachineiterativelylowerstheVMsWSSby5%ofthe FAST SLOW COOL_DOWN Committed_AS changesCommitted_ASchangesswapin/refaultdetectedCool_downcounter reaches swapin/refaultdetected swapin/refaultdetected Committed_ASchangesFigure1:Thenite-statemachineusedtotrackaVMsworkingsetsize. valueattheendofeveryepoch(epochsizesetto1secondcurrently),untilswapinorrefaulteventsoccurwithinthecurrentepoch,whichsuggeststhenite-statemachinemayhaveovershottheWSSadjustment.Assoonasswapin/refaulteventsariseinanepoch,thenite-statemachineraisestheVMscur-rentWSSestimatebythesumoftheobservedswapinandrefaulteventcounts,andentersthe DOWNstate,regardlessofwhetherthenite-statemachinewasoriginallyintheFAST DOWNSLOWWhileinthe DOWNstate,thenite-statema-chineinitializesacool-downcountertoadefaulttime-outvalue(currentlysetat8seconds)andwaitsforittoexpire,andresetsthecool-downcountertothesamedefaultvalueifadditionalswapin/refaulteventsarise.IntheSLOWstate,thenite-state-machineap-pliesthesamelogicasinFASTstateexceptthattheVMsWSSisiterativelyloweredby1%ofthecur- valueineachepoch.WheneverthetrackedVMs changes,thenite-statemachineconsiderstheVMsworkingsetsizehaschangedsignicantly,andresetsitselfbyenteringtheFASTstateandre-initializingtheVMsWSStothenew 3TWS-basedMemoryBallooningMemoryballooning[21,8]isatechniquethatreclaimsphysicalmemoryfromaVMbyinstallinginsidetheVMaballoondriverthatallocatesmemorypagesfromtheVMskernelviathestandardAPIs,pinsthemdown,andreturnsthemtothehypervisor.TheballoontargetofaballoondriveristhedifferencebetweentheVMscon-guredmemoryrequirementandtheamountofmemoryitallocatesfromtheVM.HowtocorrectlysetaVMsballoontargetisanimpor-tantissue.WhenaballoondriverallocatesmorethanthehostVMsfreememorypool,theVMOSspagerecla-mationmechanismistriggeredtoevictcoldpages.TheupperboundonaVMsballoontargetistheVMscon-guredmemoryrequirement,andthelowerboundistheVMsminimummemoryrequirementthatpreventsOut- 2 USENIX Association 10th International Conference on Autonomic Computing (ICAC 13)97 of-Memoryexceptions.TheoptimalwaytosetaVMsballoontargetistosetittotheVMsworkingsetsize,be-causethisallowsthehypervisortoreclaimthemaximumamountofphysicalmemoryfromaVMwhilereducingtheperformanceimpactontheVMtotheminimum.Theself-ballooningmechanismintheXenhyper-visorsetsaLinuxVMsballoontargettoitscurrent value.Thisapproachguaranteesthatapplicationsconsuminganonymousmemorynotsufferfromanyswap-indelaybecausealltheirstacksandheapsarelikelytobememory-resident.However,com-paredwiththeworkingset-basedapproachtosettingtheballoontarget,thisapproachhastwodeciencies. doesnotfactorthepagecacheintoaVMsphysicalmemorydemand,andthusmaycausesubstantialperformancedegradationforapplica-tionswithintensivediskI/Oactivities,whichcouldsig-nicantlybenetfromthepagecache.Incontrast,theworkingsetapproachkeepsacounterforrefaultevents,andincorporatesthiscounterintothecalculationofaVMsworkingsetsizeandthusballoontarget.Second, capturesonlythepagesthatareallocatedbutnotthosethatareactuallyusedrecently.Morespecif-ically, isincrementedupontherstac-cesstoeachnewlyallocatedanonymousmemorypageandisdecrementedonlywhentheownerprocessexplic-itlyfreesthepage.Forexample,ifaprogramallocatesandaccessesamemorypageonlyoncewhenthepro-gramstartsbutleavesituntoucheduntiltheprogramex-its,theLinuxkernelcannotexcludethiscoldpagefromaVMs eventhoughitisclearlyoutsidetheVMsworkingset.Incontrast,theworkingsetap-proachactivelyforcestheVMOStoinvokeitspagereclamationmechanismtopinpointandevictcoldpages.4PerformanceEvaluationInthispaper,wereporttheresultsofaperformanceeval-uationstudyofTWS-basedmemoryballooning.ThetestmachineusedinthisstudycontainsanIntelCorei7quad-coreprocessorwithVTandEPTenabledand16GBphysicalmemory,andrunsXen-4.1with64-bitvanillaLinux3.2.6asthekernel.AlltheVMsinthisstudyareconguredwith1virtualCPUand2GBmemory,andrunLinux3.2.664-bitkernelwiththeourdevelopedkernelmoduleformemoryballoon-isakernelthreadthatwakesupeverysecondtocollectrelevantinformation,suchas ,swapincountandrefaultcount,andmakeadjust-mentstotheballoontarget.ToverifytheeffectivenessoftheseTWS-basedbal-looningalgorithm,werstcompareditwithself-ballooningmechanismintheXenhypervisor.ThenwecompareditwiththelatestVMwareESXi5.0server. Benchmark TWSBallooning SelfBallooning Used Degra- Target Degra- Target dation dation SPECweb 0% 263.3MB 0% 263.3MB SPECcpu 3.08% 783.6MB 4.11% 922.6MB OLTP 3.31% 350.8MB 17.99% 328.8MB Table1:ComparisonbetweenTWS-basedballooningandselfballooningintermsofperformancedegradationandballoontargetforthethreebenchmarks,SPECwebBanking,SPECCPU401andOTLP.TheperformancedegradationiscalculatedbasedonacomparisonwiththeperformanceofthesameVMthatisconguredwith2GBmemory.Inthiscomparison,weusedtwoidenticaltestmachineswhereonerunstheXenhypervisorwiththeTWS-basedmemoryvirtualizationoptimizationsandtheotherrunstheESXiserver.ThememorygiventoeachVMdoesnotincludeanythingownedbythehypervisor.4.1EffectivenessofTWS-basedBallooningWeevaluatetheeffectivenessofTWS-basedballooningbycomparingtheperformancedegradationandballoontargetofaVMrunningasetofbenchmarkprogramswhenTWS-basedballooningisusedwiththosewhenXensself-ballooningisused.TheballoontargetofaVMistheamountofphysicalmemorythatamemoryballooningschemeallocatestotheVM.Theperformancedegradationofamemoryballooningschemeistheper-formancedifferencebetweenabenchmarkprogramrun-ninginaVMwhosephysicalmemoryallocationiscon-trolledbytheballooningschemeinquestionandthesamebenchmarkprogramrunninginaVMthatiscon-guredwithandindeedgiven2GBmemory,ortheconguration.Thefollowingthreebenchmarkpro-gramsareused:SPECwebBanking[3]runningagainstApache[1],SPECCPU,andOLTPfromtheSysbenchsuite[4]runningagainstMySQL[2].Table1showstheperformancedegradationandbal-loontargetcomparisonbetweenTWS-basedballooningandself-ballooningforthethreebenchmarkprograms.ThememoryrequirementofSPECwebBankingbench-markissmallerthantheminimumphysicalmemoryal-locationtothetestVM,263.3MB.Asaresult,bothTWS-basedballooningandself-ballooningproducethesameballoontarget,whichisthesameastheminimumphysicalmemoryallocation,andthebenchmarkprogramdoesnotexperienceanyperformancedegradationunderTWS-basedballooningandunderself-ballooning,whencomparedwiththeBaselineconguration.FortheSPECCPU401benchmark,theaverageballoontargetofTWS-basedballooningis15.07%(783.6MBvs.922.6MB) 3 9810th International Conference on Autonomic Computing (ICAC 13)USENIX Association \r\f\nFigure2:TheballoontargetsproducedbyTWS-basedballooningandself-ballooningovertime,andtheresult-ingcombinedswapinandrefaultcountovertimeunderTWS-basedballooning,whentheSPECCPU401bench-markisusedasthetestworkload. \r\f \r \f\n\t\b \r\f Figure3:TheballoontargetsproducedbyTWS-basedballooningandself-ballooningovertime,andtheresult-ingcombinedswapinandrefaultcountovertimeun-derTWS-basedballooning,whentheSysbenchOLTPbenchmarkisusedasthetestworkload.smallerthanthatofself-ballooning,andyettheperfor-mancedegradationofTWS-basedballooningissmallerthanthatofself-ballooning(3.08%vs.4.11%).ThesuperiorityofTWS-basedballooningcomesfromthefactthattheworkingsetsizeitproduceseffectivelyremovespagesthatareallocatedbutunused,asshownbythegapbetweenthetwoballoontargetcurvesinFig-ure2.However,despiteallocatingasmalleramountofphysicalmemorytothetestVM,theperformancedegradationofTWS-basedballooningissmallerthanself-ballooning,becauseitreactsfastertothesuddenchangeintheVMsdemand,e.g.attimepoints320sec-onds,460seconds,and630secondsofFigure2.Dur-ingthesetransitions,TWS-basedballooningisabletoallocatemorephysicalmemorythanCommitted AS,andthuscutsdownunnecessaryswapinandrefaultevents.BecausetheOLTPbenchmarkperformsintensivediskI/Oaccessesandthusrequiresalargerpagecache,Com- ASisnotanaccurateestimateofthebenchmarksworkingsetasitdoesnottakeintoaccountpagecache.Asaresult,theaverageballoontargetproducedbyTWS-basedballooningis6.70%higherthanself-ballooning,andjustiablyso,becausetheperformancedegradationofTWS-basedballooningisonly3.31%,whichissignif-icantlysmallerthanthatofself-ballooning,or17.99%.AsshowninFigure3,TWS-basedballooningdetectsre-faulteventsandincreasesthetestVMsballoontargetaccordingly,andasaresultproducesaballoontargetthatismoreinlinewiththeVMsworkingsetsizeandmorecapableofreducingtheperformanceoverheadofmem-oryballooningtotheminimum.WealsoruntwoVMs,onewithaconstantworkingsetsizeof300MBandtheotherwithaconstantworkingsetsizeof1200MB,ontheXenhypervisorwithTWS-basedballooningandonVMwaresESXi5.0.EachVMisconguredwith2GBmemorybutgivenonly263.3MBatthestart-uptime.AfterthesetwoVMsstarttorun,ittakesTWS-basedballooning10secondstoreachtheidealphysicalmemoryallocation,i.e.,giving300MBtothe300MBVMandgiving1200MBtothe1200MBVM.However,forthesameset-up,ittakesVMwareESXi136secondstoreachthesameidealphysicalmemoryalloca-tion.ThereasonthatVMwareESXitakeslongertoac-complishthesameisbecauseitusesasamplingapproachtoprobeaVMsworkingsetsize.5RelatedWorkStandardoperatingsystemsestimatetheactiveportionofbuffercacheorpagecachebymaintainingLRU-likestatistics[19,12,5]toimplementpagereplacementlogic.Luetal.[14]proposedtoallocateasmallpor-tionofmemorytoeachVMwhileleavingtheremainingmemoryasanexclusivecacheismanagedbythehyper-visor.Thus,thememoryaccessesofVMscanbein-terceptedwithintheexclusivecache,andtheLRUmissratiocurve[5]isderivedtomeasuretheworkingsetsize.Zhaoetal.[24,23]trackthememoryaccessofVMsbychangingtheuser/supervisorprivilegebitofguestpagetableentriestosupervisormodesothatallmemoryac-cessofVMwillbetrappedbecausetheVMrunsinusermode.Similarly,theLRUmissratiocurveisalsoderivedforworkingsetsizeprediction.Toreducetheoverheadfromtrappingmemoryaccess,theVMwareESXserver[21]usessamplingbasedmech-anismtopredicttheworkingsetsizeofVMs.Toper-formthesampling,theESXserverrandomlychoosesafewhundredsmemorypagesperiodically,e.g.,thede-faultsettingistochoose100pagesper60-secondforeachVM.However,thismechanismonlygivesaroughestimationoftheVMworkingsetsize,anditcannotre-ecttheworkingsetsizeexceedingthecurrentallocatedmemory. 4 USENIX Association 10th International Conference on Autonomic Computing (ICAC 13)99 Whenitcomestoreclamationmechanism,theClockalgorithm[9]iscommonlyusedinguestOSsandsev-eralresearchefforts[17,22,7,11]aimedtoestimatetheworkingsetsizebymonitoringthechangesofaccessbitonthehardwarepagetable.ThisapproachrequiresmodicationstotheguestOS.Incontrast,ourapproachleveragestheguestOSspagereclamationmechanismanddoesnotrequireanyguestOSmodications.6ConclusionMakingefcientutilizationofthephysicalmemoryavailableonavirtualizedserverisakeytechnicalchal-lengeformodernhypervisors.Possiblesolutionsincludememoryde-duplication,whichallowsdifferentVMstosharecommonpages,andmemoryballooning,whichre-claimsunusedpagesfromaVMwhenitsphysicalmem-oryallocationislargerthanitsworkingsetsize.ThispaperdescribesandevaluatestechniquesthatexploittheknowledgeofeachVMsworkingsettodelivermoreef-cientmemoryballooning.Moreconcretely,thespecicresearchcontributionsofthisworkareAlow-overheadactiveprobingmechanismthatcouldaccuratelysensetheworkingsetofeachVMandtrackitdynamically,Anintelligentmemoryballooningalgorithmthatcoulddetectallocatedbutunusedpagesandreclaimthem,andComparedwithVMwaresESXi,whichisastate-of-the-arthypervisor,theproposedworkingsetestimationschemeismoreaccurateandmoreresponsivetoworkingsetchanges,butincursaslightprobingoverhead,thepro-posedmemoryballooningalgorithmisabletoquicklyreclaimmorememorypageswithoutincurringadditionalperformancepenalty.References[1]Apachehttpserverproject.http://httpd.apache.org/.[2]Mysql:opensourcedatabaseserver.http://www.mysql.com/.[3]Specweb2009.http://www.spec.org/web2009/.[4]Sysbench:asystemperformancebenchmark.http://sysbench.sourceforge.net/.[5]AASI,G.,CCAVAL,C.,ANDADUA,D.A.Calculatingstackdistancesefciently.MSP02,ACM,pp.3743.[6]ARCANGELI,A.,EIDUS,I.,ANDRIGHT,C.Increasingmem-orydensitybyusingKSM.LinuxSymposium,2009,pp.1928.[7]BANSAL,S.,ANDODHA,D.S.Car:Clockwithadaptivereplacement.FAST04,USENIXAssociation,pp.187200.[8]BARHAM,P.,DRAGOVIC,B.,FRASER,K.,HAND,S.,ARRIS,T.,H,A.,NEUGEBAUER,R.,PRATT,I.,ANDARFIELD,A.Xenandtheartofvirtualization,vol.37.ACM,2003,pp.164177.[9]CORBATO,F.J.Apagingexperimentwiththemulticssystem.InHonorofP.M(1969),Morse,MITPress,pp.217228.[10]GUPTA,D.,L,S.,VRABLE,M.,SAVAGE,S.,SNOERENA.C.,VARGHESE,G.,VOELKER,G.M.,ANDAHDAT,A.Differenceengine:Harnessingmemoryredundancyinvirtualmachines.OSDI08.[11]J,S.,CHEN,F.,HANG,X.Clock-pro:aneffec-tiveimprovementoftheclockreplacement.ATEC05,USENIXAssociation,pp.3535.[12]J,S.,HANG,X.Lirs:anefcientlowinter-referencerecencysetreplacementpolicytoimprovebuffercacheperfor-mance.SIGMETRICS02,ACM,pp.3142.[13]JHIANG,T.-.C.Introspection-basedmemoryde-duplicationandmigration.VEE13.[14]L,P.,ANDHEN,K.Virtualmachinememoryaccesstracingwithhypervisorexclusivecache.USENIXATC07,USENIXAssociation,pp.3:13:15.[15]MAGENHEIMER,D.Addself-ballooningtoballoondriver.discussiononxendevelopmentmailinglistandpersonalcommu-nication,april2008.[16]MAGENHEIMER,D.TranscendentMemoryonXen.XenSummit,February2009,p.3.[17]MAUERER,W.ProfessionalLinuxKernelArchitecture.WroxPressLtd.,Birmingham,UK,UK,2008.[18]MURRAY,D.G.,H,S.,ANDETTERMAN,M.A.Satori:En-lightenedpagesharing.ATEC09.[19]ONEIL,E.J.,ONEIL,P.E.,EIKUM,G.Thelru-kpagereplacementalgorithmfordatabasediskbuffering.MODRec.22,2(June1993),297306.[20]SCHOPP,J.H.,FRASER,K.,ANDILBERMANN,M.J.Re-sizingmemorywithballoonsandhotplug.LinuxSymposium2(2006),313319.[21]WALDSPURGER,C.A.Memoryresourcemanagementinvmwareesxserver.SIGOPSOper.Syst.Rev.362002),181194.[22]ZHANG,I.,GARTHWAITE,A.,BASKAKOV,Y.,ANDARRK.C.FastrestoreofcheckpointedmemoryusingworkingsetSIGPLANNot.46,7(Mar.2011),8798.[23]ZHAO,W.,J,X.,WANG,Z.,W,X.,L,Y.,AND,X.Lowcostworkingsetsizetracking.USENIXATC11,USENIXAssociation,pp.1717.[24]ZHAO,W.,AND,Z.Dynamicmemorybalancingforvirtualmachines.InVEE09VMwareESXi5.0.0build-623860. 5