/
RTPRobustTenantPlacementforasticInMemoryDatabaseClustersJanSchaffner RTPRobustTenantPlacementforasticInMemoryDatabaseClustersJanSchaffner

RTPRobustTenantPlacementforasticInMemoryDatabaseClustersJanSchaffner - PDF document

cecilia
cecilia . @cecilia
Follow
343 views
Uploaded On 2022-09-07

RTPRobustTenantPlacementforasticInMemoryDatabaseClustersJanSchaffner - PPT Presentation

773 ttpwwwgartnercomid2151315Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforpro2torcommercial ID: 952809

tabu robust 000 constraint robust tabu constraint 000 robustfit mip offset minute experiment excess load interl ticks splitmerge kairos

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "RTPRobustTenantPlacementforasticInMemory..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

773 RTP:RobustTenantPlacementforasticIn-MemoryDatabaseClustersJanSchaffner,TimJanuschowski,MeganKercher,TimKraskaHassoPlattner,MichaelJ.Franklin,DeanJacobsHassoPlattnerInstitute,Potsdam,GermanySAPAGWalldorf,GermanyBrownUniversityAMPLab,UCBerkeleyABSTRACTInthecloudservicesindustry,akeyissueforcloudoperatorsistominimizeoperationalcosts.Inthispaper,weconsideralgorithmsthatelasticallycontractandexpandaclusterof ttp://www.gartner.com/id=2151315Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprotorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationontherstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecicpermissionand/orafee.June22–27,2013,NewYork,NewYork,USA.Copyright2013ACM978-1-4503-2037-5/13/06...$15.00.aswellascon\fguring,operating,andmaintainingapplica-tionserversanddatabases.TheSaaSprovider,ontheotherhand,canleverageeconomiesofscalebyautomatingcom-monmaintenancetasksaswellasbyconsolidatingmultiple ttp://www.biondemand.com/businessintelligence 774 !"#$%&'( )"#$%*$( +"#$%&'( ,"#$%&'( !"#$%&'( -"#$%*$( +"#$%&'( -"#$%*$( ,"#$%&'( )"#$%*$(./01/0"&./01/0"*./01/0"2 !"#$%&'( -"#$%*$( +"#$%&'( ./01/0"2./01/03 )"#$%*$( ,"#$%&'( !"#$%&'( -"#$%*$( +"#$%&'( )"#$%*$( &#x/="8;W=";5678"950:78;+"#$%2(@"!"#$%2(@")"#$%3(@"-"#$%3(@","#$%2( Figure1:Usinginterleavedreplicationforminimizingthenumberfserversload.Individualtenantreplicasaremigratedwhiletheten-antremainson-line[8,21].Thisallowsustomakefrequent,incrementalchangestothetenantplacementwiththegoalofrunningwiththeminimalnumberofserversateachpointintime.Asourexperimentsshow,incrementalplacementcandecreaseservercostforanaveragebusinessdaybyuptoafactoroften(measuredinAmazonEC2serverhours).Moredrasticsavingscanberealizedduringlongerperiodsoflowactivity,suchasweekendsorholidays.Atthecoreofourapproachistheconceptofinterleavingreplicasacrossnodes,whichhasbeenstudiedinthecontextoffault-toleranceforparalleldatabases[14].Wearealsointerestedintoleratingserverfailures,butourproblemisdi erent:wetrytominimizethenumberofserversrequiredateachpointintime,whereastheexistingworkonparalleldatabasesassumesa\fxedclustersize[16,14,19].Also,atypicalSaaStenantissmallandhencethereisnotmuchbene\fttohorizontalpartitioning,whichisaprerequisitefortheexistinginterleavingstrategies[19,24].Figure1showsaninstanceofRTPwith\fvetenantsandtworeplicaseach,acrosswhichallloadisshared.Acommonapproachforassigningthesetenantstoserversistouseastandard\frst-\ftalgorithmandmirrortheresultingplace-ment(Figure1a).Notethatinthiscaseserversmustrunatorbelowhalftheircapacitylimit,sincetheloadonaserverdoubleswhenitsmirrorserverfails.Incontrast,Figure1bshowsaninterleavedplacementforthesamesituation.Inthiscaseafailureofserver1woulddistributetheloadoftenantandserver2andtheloadoftenantandserver3.Asaresult,thelayoutrequiresonlythreeinsteadoffourserverswhileguaranteeingthatnoserverisoverloadeduponthefailureofanyoneotherserver.Insummary,thispapermakesthefollowingcontributions:1.WeintroduceandformalizetheRobustTenantPlace-mentProblem(RTP)includingcontinuousmigration,in-terleavingofreplicas,andcopingwithhardwarefailure.2.WepresentseveralnovelalgorithmsthatsolveRTPforbothstaticandincrementalscenarios.3.Weprovideanextensiveexperimentalevaluationusingreal-worldloadtracesfromaproductioncloudservicerunbySAPandbenchmarkouralgorithmsintermsofcost,runningtime,andtheirabilitytomaintainresponsetimeguaranteesinspiteofserverfailures.4.Wedevelopandevaluategenericover-provisioningstrate-giesforavoidingexcessloadcomingfromunexpected 0 20 40 60 80 100 11-27-2010 12-04-2010 12-11-2010 12-18-2010 12-25-2010 01-01-2011 Request rate in (%)Time (in days) Figure2:Aggregaterequestrateacross\fvecalendarweeks,in-udingChristmasloadspikes;itturnsoutthatthesestrategiesalsohelptocopewithmultiplesimultaneousserverfailures.Apartfromthecostsavingspossiblebyusingourap-proach,ourexperimentsmayprovideguidancetoadmin-istratorsseekingtobalanceoveralloperatingcostandtem-porarilyoverloadedserversafterunexpectedloadspikes.Weprovidepracticaladviceonthequestionofhowmanyrepli-caseachtenantshouldhavesothatoverallcostandrobust-nesstowardsunexpectedloadspikesarebalanced.Theremainderofthispaperisorganizedasfollows.InSection2,weanalyzereal-worldloadtraceswereceivedfromSAP,motivatingtheneedforplacementalgorithmsthatmakeincrementalchangestoexistingplacementsatfrequentintervals.Section3providesarigorousformaliza-tionofRTP.Section4containsdetailsonouralgorithmicapproach.InSection5,weevaluateouralgorithmsexper-imentally.Section6surveysrelatedwork.Section7con-cludesthepaper.2.ENTERPRISESaaSOVERVIEWInordertodesignmoreecientdataplacementalgo-rithmsformultitenancy,weanalyzedrealworldtracesob-tainedfromaproductionmulti-tenant,on-demandapplica-tionthatrunsonanin-memorydatabase.Thesetracesweretheanonymizedapplicationserverlogsof100randomlyse-lectedtenantsinEuropeoverafourmonthsper

iod.Wealsohaveadditionalstatisticssuchasthetenants'databasesize.Figure2showsanormalizedviewoftheaggregatenumberofrequestsacrossalltenantsovera\fveweekperiod.Inthetrace,onecanclearlyseethattenantbehaviorfollowsseasonalities:workdays(withadropatlunchtime),work-ingweeksandannualcalendarevents.OneexampleforthelatterisChristmasEvewhichwasaFridayin2010.Onthisday,theloadisconsiderablylowerthanforaregularFriday.Investigatingthelogdatarevealedanotherinterestingpattern:anon-negligiblenumberoftenantssuddenlyap-pear,usethesystemactivelyfor2{3weeks,thenbecomeinactiveforaconsiderableamountoftime(say,twoweeks),andsuddenlybecomeactiveagainforsixweeks(Figure3).Analysisrevealedthattheseweremainlydemoandtrainingsystems.Capacityplanningoftenneglectsthosesystems.Thebehavioroftenantsintrialperiodsisparticularlyhardtopredict,which,inpart,motivatestheincrementalplace-mentalgorithmspresentedinthispaper.Amoredetailedanalysisoftheloadtracescanbefoundin[22].Thefollowingsectionsleveragetheseinsightsforourplacementalgorithms.Furthermore,weusedthereal-worldtracesforevaluatingourplacementtechniquesinSection5. orprivacyreasons,wearenotallowedtopublishabsoluterequestsratesandthe\fgureonlyshowsrelativevalues. 775 0 20 40 60 80 100 cw 48 cw 49 cw 50 cw 51 cw 52 cw 53 cw 54 cw 01 cw 02 cw 03 cw 04 cw 05 cw 06 cw 07 cw 08 cw 09 cw 10 cw 11 cw 12 cw 13 Request rate (%)Time (calendar weeks) Figure3:Demosystemusedonlyatbeginningandendoftrial.THEROBUSTTENANTPLACEMENTANDMIGRATIONPROBLEMInthissection,weformallyintroducetheproblemofas-signingtenantstoserverssuchthatresponsetimeSLOsaremetandcostsareminimized.WecallthisproblemtheRo-bustTenantPlacementandMigrationProblem(RTP)Avalidtenantplacementisanassignmentofatleasttwocopiesofagivennumberoftenantstoanumberof(cloud)serverssuchthatnoserverisoverloadedintermsofanyofitsresourcesnoservercontainsmorethanonecopypertenant,andthefailureofasingleserverdoesnotcauseoverloadinganyotherserver.Atenantischaracterizedbyitssize)(i.e.theamountofspaceeachreplicaconsumesinmemory)anditsloadOurformalizationofRTPrequiresthataworkloadman-agementtechnique(i.e.challenge(i)intheintroduction)isinplacetoprovide)asaninput.Inparticular,anes-timationofthecombinedresourceconsumptionofmultipletenantsonamachineisrequiredtoprovideRTPwithanindicatorforthe\\flllevel"ofaserver.Therefore,webeginwithdiscussingourassumptionsontheunderlyingresourcemodelingapproachbeforeformalizingRTP.3.1ModelingTenantResourceRequirementsAsstatedinSection1,tenantplacementrequirestoes-timatetheresourceconsumptionandtheimpactonotherclientswhenanewtenantisplacedonaserver.Thisprob-lemhasbeenstudiedforbothdisk-based[5,18,7],andin-memorydatabases[21].Inprinciple,anyoftheseloadmetricscanbeusedinRTP.Webuildon[21],sinceourfo-cusisalsoonin-memorydatabases.Thisfocusconsiderablysimpli\fesresourcemodelingincontrasttothedisk-basedapproaches.Scha neretal.[21]proposearegressionmodelforestimatingthecombinedCPUandCPU-memoryband-widthutilization.Theinputstotheirregressionmodelarethedatabasesizesandthecurrentrequestrateoftheten-ants.Alltenantsrunthesameworkload.TheoutputisascalarvaluerepresentingthecombinedCPUandCPU-memorybandwidthutilizationoftheserver.Thisissimilartotheapproachin[7],wherealogicalI/Ometricisintro-ducedtocharacterizethedominantresourcebottleneck,al-thoughtheirfocusisondisk-basedsystems.Theloadmetricfrom[21]isadditiveacrossmultipletenants.Curinoetal.[5]alsoobserveadditivityofCPUandmemoryconsumptionindisk-basedsystems.InRTP,weusetheloadmetricfrom[21]asaninputparameterindicatingtheloadofatenant.Inprincipal,ourheuristics(Section4.1)couldalsohandlenon-linearresourcemodelsfordiskaccessmakingourtechniquesapplicabletoawiderrangeofsystems.ServiceLevelObjectives(SLOs).Asstatedabove,ten-antconsolidationisconstrainedbycompliancewithresponsetimeSLOs.Here,werequiretheunderlyingresourcemodeltoincorporateSLOssothatitenablestheplacementalgo-rithmstoguaranteeSLOcomplianceaslongastheplace-mentisvalid.ForprovidingsuchSLOs,wemayagainbuildontheresourcemodelingapproachin[21]wheretheloadmetriciscorrelatedtothequeryresponsetimesofaserverinthe99-thpercentile(acrossalltenantsandforthegivenworkload).NormalizingthemaximumloadpriortoanSLOviolationto10allowsustoexpressSLOcomplianceas0,where)istheloadofatenantinthesetofalltenantsonaserver.Thus,aslongastheplace-mentalgorithmkeepsthetotalloadofaserverbelow10,theresponsetimegoalswillbemet(similarto[18]).3.2FormalizationBasedontheaboveconsiderations,RTPcanbedescribedasfollows.Givenaninitialplacement(potentiallycontain-ingoverloadedservers),\fndavalidplacementbymigratingnotmorethanalimitedamountofdata(calledthemigrationbudget)suchthatthenumberofactiveserversisminimal.Wecallaserveractiveifitcontainsatleastonetenant.Weconsideraserveroverloadedwhenoneofitsresourcesisusedbeyonditscapacitylimit.Thisincludesadditionalloadredirectedtoaserverwhenanotherserverhasfailed.AvalidinstanceofRTPhasthefollowingdataasIN,thesetoftenants.IN,thesetofavailableservers.;:::;r,thereplicaspertenant.isthe(\fxed)numberofreplicasperten

ant;Section3.4containsdetailsonhowtoobtainIN,afunctionreturningtheDRAMrequire-mentofagiventenant.capIN,afunctionreturningtheDRAMca-pacityofagivenserver.,afunctionreturningthecurrentloadofagiventenant.AsdiscussedinSection3.1,asinglemetricissucientgivenourfocusonin-memorydatabases.cap,afunctionreturningtherequestpro-cessingcapacityofagivenserver.Anexistingtenantplacement.AmigrationbudgetIN.Thisparameterdependsonthelengthofareorganizationinterval,i.e.thetimeafterwhichaplacementisreconsidered..;1],afunctionreturningthecapacitylosswhenaserverisamigrationtarget.et.;1],afunctionreturningthecapacitylosswhenaserverisamigrationsource.Wewilldiscusshowtoobtainandlaterinthissection.InanassignmentformulationofRTP,avalidsolutionmustassignappropriatevaluestothefollowingdecisionvari-ablesasabinarydecisionvariable2fwitht;i1ifandonlyifcopyoftenantisonserver2f,where=1denotesthatserverisactive.,wheredenotesthecapacityofserverthatmustbeleftunusedsuchthatadditionalloadduetoasingleserverfailuredoesnotcauseanSLOviolation.WecallthepenaltythatmustbereservedonserverAsstatedinSection3.1,anSLOviolationoccurswhenthetotalloadofanynon-failedserverexceeds10asaconsequenceofanotherserverfailing.TheobjectiveofRTPistominimizethenumberofactiveservers,i.e.min 776 AsolutionofRTPmustobeythefollowingconstraints.;i=1T;(1)Constraint(1)ensuresthateachreplica1)ofatenantisassignedtoaserverexactlyonce.t;iT;(2)Constraint(2)ensuresthatnotwocopiesofthesametenantareplacedonthesameserver.t;icap(3)Constraint(3)ensuresthatthetotalsizeofalltenantsonaserverdoesnotexceedtheserver'sDRAMcapacity.Ifatleastonetenantisassignedtotheserver,issettoone.Similarly,thefollowingconstraintensuresthatthetotalloadofalltenantsonaserverdoesnotexceedtheprocess-ingcapabilitiesoftheserver.Weassumethatloadcanbesharedacrossreplicas:eachserverholdingareplicaoften-antreceivesonly1=r)-thof).WewilljustifythisassumptioninSection3.3. ;icapN:(4)Eachservermustbecapableofpotentiallyhandlingaddi-tionalloadincaseanotherserverfails.ThesparecapacityreservedforthisexcessloadiscapturedbyapenaltyConstraint(4).Thefollowingconstraintde\fnesthepenalty.=maxk;k ;it;j(5)Whatfractionofatenant'sloadmustbeaddedtode-pendsonthenumberofremainingreplicas.Ifserverhan-dledafraction=r)oftheloadoftenantloadpriortothefailure,thentheremaining1replicasoftenantmustsharetheloadafterthefailure.Hence,theextraloadthatservermustsupportis r(t)1 r(t)�1=`(t) Constraint(5)ensuresthatissetlargeenoughtoguar-anteethateventhefailureofthe\worstcase"otherserverwouldnotresultinoverloadingserver.Thecon-straintrendersheuristicsforbin-packingunusableforRTP:giventhreeserversU;V,and,movingatenantfrommayincreaseandthusrenderserverunabletosustaintheextraloadcomingfromanotherserverfailing.RTPguaranteesthatperformanceSLOsaremetwhiletenantsarebeingmigrated.Theintuitionthatmigrationa ectsquerylatenciesisquanti\fedin[21]forthoseserversparticipatinginmigrations.Duringamigration,asourceserverwithatotalloadof=085(oradestinationserverwithatotalloadof=082)producesaresponsetimeofonesecondinthe99-thpercentile.WebuildonthisinthefollowingConstraints.NotethatConstraint(4)implicitlytakesthegivenplacementintoaccountviatheparameter:wheneverserverisamigrationtarget,i.e.atenantisassignedtoserverthatwasnotassignedtothisserverpreviously,theloadcapacityoftheserverdropsbyafactorof1.Ifisnotamigrationtargetwehave=1.Fornotationalconvenience,wede\fnemigacopyofwasmoved hisassumesthatallmigrationsareexecutedsequentially.Degradationfactorsdependontheworkload,thehardware,andonhowmigrationsareimplementedintheDBMS.mig cap(7)Constraint(7)ensuresthatforeverytenantbeingmi-grated,aserverexiststhathasenoughsparecapacitytoactasamigrationsource.Astrongerversionof(7),inwhichwereplacemigby,guaranteesthateverytenanthasareplicaonaserverthatmayactasamigrationsource.Constraint(8)enforcesthemigrationbudgetmig(8)Constraints(7)and(8)mayrenderRTPinfeasibleincaseofextremeloadchangeincomparisontothegivenplace-ment.Insuchcases,itmayoccurthat(i)noservercanactasasafemigrationsourceforatenantor(ii)themigra-tionbudgetisnotlargeenoughforrepairingalloverloadedservers.Whenaninfeasibilityoccurs,itbecomesnecessarytotolerateSLOviolationswhilerestoringavalidand\rexi-bleplacement.Besidestemporarilydroppingconstraints,achangeintheobjectivefunctionbecomesnecessarytomin-imizeSLOviolations.Insteadofminimizingthenumberofactiveservers,aplacementshouldbefoundwiththelowestnumberofoverloadedservers,whichcanbeformalizedasfollows.Weintroduceavariablewhichmeasurestheexcessloadonaserver.For,wede\fne ;icap(9)andalternativeobjectivefunctionsareminorminmax(10)ComputationalComplexity.WeintroduceaspecialcaseofincrementalRTP,whichwillbeusefulinourexperimentsaswellasfordiscussingthe(computational)complexityofRTP.WecallthesubclassofincrementalRTPwhere(i)noinitialplacementisgiven,(ii)both=1forall,and(iii)staticRTP.NotethatanoptimalsolutionofstaticRTPisalowerboundforoptimalsolutionsofincre-mentalRTP.Areductionfromtheproblem[11]showsthe(weak)NP-hardnessofstaticRTP.Consequently,foranarbitr

arymigrationbudget,incrementalRTPisalsoNP-hard.Weomitproofsduetospacerestrictions.3.3LoadDistributionAcrossReplicasOurformulationofRTPassumesloadtobedistributedequallyamongatenant'sreplicas(e.g.inConstraint(4)).Thisallowstoservemorerequestspertenant.However,wecanonlyobtainthisbene\ftwhentheworkloadhasread-mostlycharacteristics.Thisappliestoreal-timeanalyti-caldatabaseapplications:ananalysisofseveralenterprisedatabaseworkloadsshowedthatwritequeriesaccountforlessthan10%ofthetotalworkload[17].Tosimplifythepresentationofourformalmodel,Con-straint(4)assumesaread-onlyworkload.Splittingloadintoweightedreadandwritecomponents,(4)couldbemod-i\fedsuchthatourformulationofRTPisindependentoftheworkloadcharacteristics.Forawrite-mostlyworkload,how-ever,theloadcannotbesplitamongreplicas;instead,allre-plicasareexposedtothefullload(assumingthatwritesgotoallreplicas).Also,noload-redistributionoccursincaseofafailure.Allotheraspectsoftheproblemformulationremainintactwithwrite-intensiveworkloads.However,an 777 !"#$%&$' ("#$%&$' )"#$%&$' +,-.,-"0+,-.,-"1+,-.,-"2 !"#$%&$' ("#$%&$' )"#$%&$' +,-.,-"3+,-.,-"4+,-.,-"5 =2, !"#$%&&' ("#$%&&' !"#$%&&')*+,*+".)*+,*+"& /"#$%&&' !"#$%&&' )*+,*+"1)*+,*+"2 /"#$%&&' 3"#$%&&' ("#$%&&' 3"#$%&&' 3"#$%&&' ("#$%&&' (b)=3,Figure4:Requirednumberofserversdependentonexhaustivestudyoftheimpactofwritesonreplicatedtenantplacementisbeyondthescopeofthispaper.3.4ChoosingtheNumberofReplicasIntheprevioussection,thenumberofreplicaspertenant)wastreatedasaninputparametertoouroptimiza-tionproblem.Inthefollowing,wediscusshowtoobtain).Intuitionsuggeststoset)aslowaspossible,since(i)morereplicasrequiremorespace,whichcouldleadtoahighernumberofactiveservers,and(ii)theproblembe-comesmoreconstrained.However,increasingthenumberofreplicasbeyond)=2becomesnecessarywhentheloadofatenantissohighthatasingleservercannothandlehalfofit.Thenumberofcopies)ofatenantmustbechosensuchthat=rcap).Inaddition,servermustbeabletohandletheextraloadcomingfromanotherserverfailingthatalsoholdsacopyof.Hence,wemustchoose)insuchawaythatthefollowinginequalityapplies. r(t)+`(t) (11)WerearrangeInequality(11)for)toobtain:):=max(2 cap(12)Incontrasttoourintuition,increasingthenumberofre-plicasbeyondthelowerboundasde\fnedin(12)canleadtoplacementswithfewerservers,asshowninExample1.Example1.ConsiderfourtenantsAtoD,eachwithaloadandserverswithcapacity.Fortworeplicasperten-ant,asshowninFigure4a,eightserversarenecessarytoplacealltenants.Theloadonallserversincludingsparecapacityreservedtoaccommodatepotentialserverfailures)is1.0.Ifweallowthreereplicaspertenant,asshowninFigure4b,thenatotalofsixserversaresucient.Alsointhiscase,theloadonallserversincludingis1.0.4.ALGORITHMSFORRTPInthissection,wepresentalgorithmsforsolvingRTP.WestartbydiscussingheuristicsforstaticRTPbecausetheyformthefoundationforourincrementalalgorithms.WealsotackleRTPwithexactalgorithms,inparticular,withmixedintegerprogramming(MIP)solvers.Thechallengeliesinlinearizingthenon-linearconstraintstoobtainaMIPformulation.PowerfulsolverslikeCplexcanthenbeusedtoprovidesolutionsandboundsontheoptimalsolution. ttp://www.ilog.com/products/cplexDuetospacerestrictions,weomitthelinearizationsandfocusontheheuristics.4.1HeuristicsforStaticRTPGreedyHeuristics.Fortherelatedbin-packingproblem,greedyheuristicsdelivergoodresults[11].Anotherreasonforconsideringgreedyvariantsaretheirspeed.Evenforshortmigrationintervals,agreedyheuristiccanbeusedwhenmorecomplexalgorithmsareprohibitivelyslow.Ourgreedyalgorithmsarelooselybasedonthewell-knownbest-\ftalgorithm[4].Whenplacingasinglereplicaofatenant,foreachserveritstotalloadincludingitspenalty(Section3.2)iscomputed.Theserversarethenorderedac-cordingtoloadpluspenaltyindecreasingorder.Similartobest-\ft,the\frstserverthathasenoughfreecapacityisselected.Ifnoactiveserverhasenoughcapacity,thenthetenantisplacedonanewserver.Apartfromloadpluspenaltyontheservers,weconsiderConstraints(1){(7).Thisbasicmechanismforplacingasinglereplicaofaten-antiscalledrobust\ft-single-replica.Itisthebasisfortheal-gorithmsrobust\ft-s-mirrorandrobust\ft-s-interl.,whichwillnowbediscussed.robust\ft-s-mirror\frstsortsalltenantsbyloadindescendingorderandplacesthe\frstreplicaofeachtenant.Sincethereisnopenaltywhenthereisonlyonecopy,thealgorithmassumesaservercapacityofcap thisstep.Then,allserversaremirrored.Finally,thealgo-rithmplacesadditionalreplicasindividuallyfortenantsthatrequiremorethantworeplicas(seeSection3.4).robust\ft-s-interl.alsosortsalltenantsandthen,tenantaftertenant,placesallreplicasofeachtenant.Forthe\frstreplicaofeachtenant,aservercapacityofcap)isassumed.ForallotherreplicasthealgorithmassumesacapacityofcapThisresultsinaplacementwhereeachtenanthasasafesourceserver.Also,tenantreplicasarenaturallyinterleavedacrossservers.Bothalgorithmshavepolynomialcomplexityandrunfastfortheproblemsizesweconsiderinthispaper.Metaheuristic:TabuSearch.Havingconsideredfastgreedyheuristics,weconsideracomputationallymoreex-pensiveheuristicnext.WeproposeanadaptationofTabusearch[12],usedasanim

provementheuristic.Givenastart-ingsolution(e.g.obtainedbyoneofthegreedyheuristicsabove),tabu-statictriestoremoveanactiveserverbytraversingthesearchspaceasfollows.EveryvalidsolutionofRTPisapointinthesearchspace.Wemovefromonevalidsolutiontoanothervalidsolutionbymigratingaten-antfromtoadi erentserver,evenifthismoveleadstoaninvalidplacement.Next,we\fxpossiblecon\ricts(ifpos-siblewithoutplacingatenanton).Inordertoavoidbothcyclingandstallinginalocaloptimum,aso-calledTabuliststoreseachmove(t;S;T).WeonlyallowamoveifitisnotintheTabulist.Whenthelistreachesacertainlength,theoldestelementisremoved.Thesearchabortsif,afteracer-tainnumberofiterations,noplacementwasfoundthatdoesnotuse.Ifasolutionwithoutwasfound,searchcontin-uesfromthenewsolutionwiththegoalofremovinganotherserver.Theperformanceoftabu-staticreliesonthecarefuladjustmentofitsparameters(e.g.Tabulistlength,choiceofservertobeclearedout,orderinwhichtenantsaremoved).Weidenti\fedgoodsettingsusingcarefulexperimentation.4.2AlgorithmsforIncrementalRTPThestaticalgorithmsdiscussedabovearethebasisforourincrementalplacementheuristics.Inordertoleverage 778 thedi erentheuristics,weuseametaheuristic,whichactssaframeworkforallincrementalplacementstrategies.Itsmainbene\ftisasigni\fcantreductionofthesolutionsearchspace,leadingtoloweroverallalgorithmexecutiontimes.4.2.1AFrameworkforIncrementalRTPTheframeworkconsistsofsixphases.Theyareexecutedatthebeginningofeachreorganizationinterval,independentofthealgorithmthatiscurrentlyrun.Individualalgorithmsmustpluginamethodforplacingasinglereplicaofatenantorreplaceentirephases.Suchamethodisforexampletherobust\ft-single-replicamethoddescribedabove.Anincre-mentalalgorithmcanalsoprovideanownimplementationforindividualphasesoftheframework.Thesixphasesofthisframeworkareasfollows.1.Deleteunnecessaryreplicas.Whentheloadofatenanthasdecreasedincomparisontothepreviousinterval,itmightbethecasethatremovingareplicaofthetenantispossible(seealsothediscussiononthelowerboundonthenumberofreplicasinSection3.4).Therefore,inthisphase,aheuristicallyselectedreplicaofalltenantsmeetingthisconditionisdeleted.Notethatdeletingatenantdoesnotcounttowardsthemigrationbudget.2.Ensuremigration\rexibility.Thisphaseensuresthatalltenantshaveatleastonereplicaonaserverthathasenoughsparecapacitytoparticipateinamigrationasasourceserver(Constraint(7)).Fordeterminingthisserver,theplugged-inalgorithmisused.ThisresultsintheabilitytomigratetenantswithoutcausingSLOviolations.3.Createmissingreplicas.Thisphasehandlestheoppo-sitecaseofphase(1),wherethelowerboundonatenant'sreplicashasincreasedasaresultofincreasingload.Theplugged-inalgorithmisusedtoplaceenoughextrareplicasasnecessarytomatchthenewlowerbound.4.Fixoverloadedservers.Thisphaserepairsoverloadedserversbymovingtenantsawayfromthemuntiltheyarenolongeroverloaded.Theplugged-inalgorithmisusedtode-terminethetargetserversforreplicasthataremoved.5.Reducenumberofactiveservers.Allserversareorderedbytotalloadpluspenalty.Then,alltenantsonthemostlightlyloadedserveraremovedtootherserversusingtheplugged-inalgorithm.Thisphaseisrepeatedwiththenextserveruptothepointwheretheservercannotbeemp-tiedwithoutcreatinganewserver.6.Minimizemaximumload.Whenareductionofthenumberofserversisnolongerpossible,thisphase\rattensoutthevarianceinloadpluspenaltyacrossallservers.Thegoalistoavoidhavingserversintheplacementthathaveamuchhigherpenaltythanotherservers.Again,theplugged-inheuristicisused.Thisphaseterminateswhenthemigra-tionbudgetisexhaustedorfurthermigrationswouldhavetoosmallane ectonthevariance.Theexecutionoftheframeworkisimmediatelyabortedwhenthemigrationbudgetisexhausted.Whentoolowavalueforthemigrationbudgetischosen,theplacementmaybeinvalid(i.e.itdoesnotsatisfyalltheconstraintsofRTP)afterprematuretermination.Aplacementisalwaysvalidaftercompletionofphase(4).Notethattheexecutionorderoftheaboveframeworkisitselfaheuristic.Experimentationhasrevealedthatexe-cutingphase(4)afterphase(2)resultsinfewerserversthantheinverseorder,becausesomeoverloadedserversarere-pairedasasideproductof\fndingasafemigrationsourceforthetenants.Notefurtherthatthequestionofdecid-inghowmanyreplicasatenantshouldhaveisorthogonaltothisframework.Similartoplug-inmethodsforplacingindividualreplicas,di erentstrategiesfordeterminingthereplicationfactorcanbepluggedin.Thestandardmethodistouseexactlyasmanyreplicasassuggestedbythelowerbound.Anothermethodistoincreasethelowerboundbya\fxedo set.Amoresophisticatedmethodistosetthenum-berofreplicasacrossalltenantsinawaythatallreplicasreceivemoreorlessthesameload.Alastmethodistorepairoverloadedserversinphase(4)bycreatingadditionalrepli-caselsewhere,thusdecreasingtheloadofthetenantontheoverloadedserver.Inthefollowing,wediscusstheplug-inalgorithmsthatwehavedevelopedforthisframework.4.2.2GreedyHeuristicsThesimplestandfastestalgorithmiscalledrobust\ft-inc.andmerelyentailsthemethodforplacingasinglereplicaus-ingrobust\ft-single-replica(describedinSection4.1).Thismethodispluggedintotheaboveframeworkasis.Sincethes

paceofpossibleactionswhentransformingagivenplacementintoanewplacementisverylarge,wecreatedsplitmerge-inc.Thisalgorithmactsexactlyasrobust\ft-inc.butprovidesanownimplementationofphases(4)and(5)intheframeworkabove.Inphase(4)theonlyallowedop-erationissplittingeachoverloadedserverintotwoservers.Inphase(5),conversely,mergingtwoserversintooneistheonlylegaloperation,althoughmultipleserverpairscanbemergedinonestep.Sincetheunderlyingrobust\ft-single-replicamethodisveryfast,weuseamorecomplexprocedurefordecidingwhatserverstomerge:splitmerge-inc.buildsupitslistofmergepairsbycheckingwhethertwoserversandcanbemergedforallcandidatepairs.Themethodinsplitmerge-inc.forremovingserversise ectivebutcomputationallyintensive.Itsapproachfor\fxingover-loadedserversisrathersimple:overloadedserverscannotbe\fxedwithoutcreatingonenewserverperoverloadedserver,whichseemstoodrastic.Wethereforereplacedsplitmerge-inc.'simplementationofphase(4)withthestandardoneagainandusedrobust\ft-single-replicaastheplug-inheuris-ticreferringtothisasrobust\ft-merge.4.2.3Metaheuristic:TabuSearchWealsouseourTabusearchforincrementalRTP:tabu-inc.replacesphase(5)withtheTabusearchfromSec-tion4.1.Notethattabu-inc.doesnotuseasolutionob-tainedbyagreedyheuristicasthestartingsolution;itsim-plystartswiththegivenplacement.tabu-inc.simplyomitsphase(6),whichsavessomeofthemigrationbudgetandtherebyallowstheTabusearchtorunlonger.Thenextheuristic,tabu-inc.-longworksexactlyastabu-inc.,exceptthattheparametersoftheTabusearcharesetinsuchawaythatitrunssigni\fcantlylonger(andthusvisitsmoresolutions).Finally,wecombinerobust\ft-inc.withtabu-inc.intotabu-robust\ft.Thisalgorithmrunsrobust\ft-inc.asapreprocessingstep,therebyomittingphase(6)sothattheremainingmigrationbudgetcanbeusedtoimprovetheso-lutionusingTabusearch.Anothervariantofthisalgorithmistabu-robust\ft-l.,wheretheTabucomponentrunslonger.4.2.4PortfolioApproachTheportfolioapproachcombinesallheuristicsforincre-mentalRTP.Wesimplyrunallheuristicsstartingfromthe 779 same,best-knownsolution.Wethenpickthebestsolutionmongallalgorithmsasthenextsolution.Choosingthebestsolutionasthenextsolutionisitselfaheuristicapproach.5.EXPERIMENTSInthissection,weevaluateouralgorithmsforRTP.Ourevaluationisbasedonreal-worldloadtraces,whichweob-tainedfromaproductionclusteroftheaforementionedSAPapplication.Topreservecustomerprivacy,ananonymizedrandomsampleof100tenantswasgiventous.Whilethisisonlyafractionofthecustomerbase,thesampleprovidesarealisticpro\fleofthewholecustomerbase.Unfortunately,thesampleisnotlargeenoughforexperimentsatscale.Wethususedthetechniquepresentedin[22]tobootstrapnewtenants.Thenewlycreatedtenantshaveloadtracessim-ilartotheoriginaltenantsandfollowthesameperiodicalpatterns.Our\fnaltestingdatasetcontains435tenants.Theevaluationisstructuredasfollows:Section5.1dis-cussestheperformanceofouralgorithmsw.r.t.theirbal-anceamong(i)thenumberofactiveservers,(ii)computa-tiontime,and(iii)theirrobustnesstowardsloadchanges.Wewillseethatrobust\ft-inc.achievesagoodbalancebe-tweenallthreemeasures.Consequently,Section5.2exploresrobust\ft-inc.further.Weinvestigatelowerboundsforservercost.Wealsostudythee ectsofincreasingthenumberofreplicaspertenantbeyondtheminimum,whichhasinter-estinge ectsonthenumberofactiveserversandthesta-bilityofaplacementovertheday.InSection5.3westudygenericover-provisioningstrategiestoreducetheimpactoftemporarilyoverloadedserversuntilitbecomesnegligible.Inallexperiments,weassumethatservershaveaDRAMcapacityofcap=32GB.Wefurtherassumeahomoge-neousserverloadcapacityofcap=1.0.Whileoural-gorithmsalsoworkforheterogeneousservers,homogeneitysimpli\festhepresentationofourexperiments.Resultsfromtheliteraturesuggestthatapproximabilityresultsforho-mogenousserverswillcarryovertoheterogenousservers[2,15].Intheexperiments,wesetthemigrationbudgetto=27GBbecausesuchanamountcansafelybemigratedinatenminuteintervalusingSAP'sin-memorydatabaseanda10GbitEthernetinterconnect[21].OurexperimentswereconductedonanIntelXeonX7560serverwith2.27GHzrunningLinux.WeimplementedourheuristicsinScalaandusedCplexasaMIPsolver.Wehavenotyetparallelizedourheuristics(incontrasttoCplex).Wewouldexpectasigni\fcantspeed-upfromamulti-threadedimplementation,especiallyforTabusearchandportfolio.5.1ComparisonofHeuristicsforRTPInordertoevaluateourheuristicsforsolving(incremen-tal)RTPweconsiderthefollowingthreemeasures:1.thecostassociatedwiththeresultingplacements,2.thecomputationtimesrequiredbythealgorithms,and3.robustnessoftheplacementtowardsunexpectedincreasesintenantload.Notallcanbeoptimizedforatthesametime;conse-quently,atrade-o betweenthesemeasuresmustbefound.Aparticularlyinexpensiveplacementmayrequireanunreal-isticamountofcomputingtimeandthen,atthesametime, emeasureoperationalcostasaccruedwhenusingavary-ingnumberof\highmemory"instancesonAmazonEC2.Seehttp://aws.amazon.com/ec2/pricing/.thetenantsmightbepackedsotightlythatserversarepronetotemporaryoverloadswhenloadchanges.Experiment(i):IncrementalPlacementvs.StateoftheArt.Wecompareourincr

ementalheuristicsagainsttwobaselinespublishedin[25]and[5].Table1summarizesthebene\ftsofourincrementalalgorithmsoverthesetwostaticapproachesmeasuredonatypicalworkingday.Toallowafaircomparison,wemodi\fedbothbaselineapproachessuchthattheyalsoencompassthereplication,loadbalancingandfailure-robustnesspropertiesofRTP.The\frststaticapproach,modeledafter[25],entailsmon-itoringalltenantsforoneweekandobservingthepeakloadofeachtenantwithinthatperiod.Afterwards,oneprovi-sionsforthispeakload.Weusedtheweekdirectlypreced-ingtheWednesdaychosenforourexperimentstoestimatethemaximumloadforeachtenant.WethensolvedstaticRTPfortheobservedpeakloadsusinggreedyheuristics.Table1showsthatrobust\ft-s-mirror,thesimpleststatical-gorithm,requires320servers,whereasrobust\ft-s-interl.,thebeststaticalgorithminthiscase,requires192servers.Thesecondstaticapproach,kairos-MIP,modeledafter[5],alsoentailsmonitoringalltenantsforaperiodoftimeandthencomputingastaticplacement.Incontrastto[25],wherethisplacementiscomputedbasedonthemaximumloadrequirementsobservedforalltenantsduringtheobser-vationperiod,kairos-MIPtriestoconsolidatemoreaggres-sivelybyrequiringits(static)placementtobevalidacrossalltenminuteintervalsintheobservation.WepickedtheWednesdayoftheweekprecedingourexemplaryWednes-dayandtriedtocomputeaplacementwithkairos-MIP,ourimplementationof[5]inCplex.Notethatgivenourfo-cusonin-memorydatabasesandtheabsenceofshareddiskaccess,wecanuseaMIPformulation,whichhascomputa-tionaladvantages.However,thecorrespondingMIPformu-lationbecomessolargeforourtracedatathatkairos-MIPiscomputationallyunsolvablewithinoneweek.Wethereforepickedasubsetofhigh-loadtenminuteintervalsfromtheWednesdayandrankairos-MIPonthissubset.Hence,weonlyobtainalowerboundontheactualcostofthekairos-MIPplacement;wecanexpectthekairos-MIPplacementoveralltenminuteintervals(ifitwascomputable)tobesigni\fcantlymorecostly.Ourexperimentationwithsmallersetsoftenantsandserverssuggeststhatworkingonasub-setoftenminuteintervalsresultsinplacementswhichareapprox.60%cheaperthanincludingalltenminuteinter-vals.Table1showsthatkairos-MIPrequires(atleast)45servers.Notethattherunningtimeforkairos-MIPevenwhenrunonasubsetoftenminuteintervalsismuchhigherthanreportedin[5],althoughweuseCplex.ThismaybepartiallyduetotheadditionalconstraintsofRTPthatweaddedforcomparability(e.g.Constraint(5)).However,themainreasonisprobablythemuchhighernumberoftenantsandserversinourexperimentaldatacomparedto[5].Incontrasttobothstaticbaselines,ourincrementalal-gorithms,whichaltertheplacementintenminuteinter-vals,requirebetween33and40serversduringtimesofpeakloadandmuchfewerserversduringthenightandtimesoflowload(e.g.weekends).Table1showsthecostforserverrentforallincrementalalgorithms.Onaverage,thecostforserverrentisanorderofmagnitudelowerwhenusinganin-crementalalgorithmasopposedtousingstaticprovisioningbasedonpeakloadwhilebeingcomputationallycompara-ble.Incrementalplacementisstillafactorof2.2cheaper 780 thanthelowerboundontheKairosapproach,whilebeingarsuperiorintermsofcomputationaltimes.Table1:ServercostandrunningtimeofheuristicsforRTP AlgorithmCostServersRunningtimeaxavgmax Static:obust\ft-s-mirror$3456.0032066.4srobust\ft-s-interl.$2073.60192481.3skairos-MIP$432.00453days Incremental:abu-inc.$273.83402.5s5.9stabu-inc.-long$208.203426.8s87.0stabu-robust\ft$202.95333.0s10.8srobust\ft-inc.$201.45391.7s3.7ssplitmerge-inc.$200.183895.5s321.6srobust\ft-merge$198.083284.2s256.4stabu-robust\ft-l.$193.053319.8s60.5sportfolio$191.5533182.1s565.3s Notethatforincrementalplacementtheremightbeover-eadsassociatedwithshuttingdownandpoweringupdi er-entnodesdynamicallyinacluster.Priortoshuttingdownanode,alltenantsmustbereplicatedawayfromthenode.RTPensuresthatthisisdonewithoutSLOviolations.Also,copyingtenantstoothernodescountstowardsthemigrationbudget.Poweringupanodeincurssometimeforprovi-sioning,whichisnotmodeledinRTP.Areasonably-pricedworkaroundistomaintainapoolof2{5sparenodes,whichcaninstantlybe\flledwithtenantswhenrequired.Experiment(ii):IncrementalAlgorithms:Costvs.Run-ningtime.Thetimeinatenminuteintervalissplitintothetimeforalgorithmiccomputationandtheremainder,whichisusedtophysicallycarryoutthemigrations.Theshortertherunningtimeofanalgorithmthemoretimeisavailableforperformingmigrations.Ashortrunningtimealsoindicatesgoodscalabilityofanalgorithmtowardslargerproblemsizes.Amongthefastalgorithmswithanaveragerunningtimebelow10s(seeTable1),robust\ft-inc.\fndsthesolutionswiththelowestcost.Itisalsothefastestalgorithmoverall,andthusthebestoptionforshortreorganizationin-tervals.Amongthelonger-runningheuristics,portfolionat-urallydeliversthebestresultsbecauseitcombinesallotherincrementalheuristicsandselectstheplacementwiththefewestserversineachtenminuteinterval.portfolioisalsobyfartheslowest(incremental)algorithm.tabu-robust\ft-l.isalmostasgoodasportfoliow.r.t.servercost.However,onaverage,tabu-robust\ft-l.ismorethantentimesfasterthanportfolio.tabu-robust\ft-l.isthebestchoiceifonecanallowinvestinguptooneminuteofcomputationpertenm

inuteinterval.Atcertaintimesduringthedayportfo-lioproducesplacementsrequiringmoreserversthansomeoftheotherincrementalheuristics(e.g.robust\ft-merge).Thisbehaviorresultsinportfoliorequiringahighermax-imumnumberofserversthanrobust\ft-merge(seeTable1).Thisphenomenon|counter-intuitiveat\frstsinceportfolioissupposedlythebestincrementalheuristic|highlightsthestrongin\ruencethatthegivenplacementfromthepreviousintervalhasontheabilityofanyincrementalalgorithmtominimizethenumberofactiveservers.Experiment(iii):RobustnessTowardsLoadSpikesWhenusinganincrementalplacementstrategy,onetriesto\fndaplacementusingtheminimalnumberofserverswhilestillprovidingjustenoughresourcestohandletheloadofalltenantswithoutviolatingresponsetimeSLOs.Thisre-sultsinsituationswhereservershavelittlesparecapacity.Whenchangesintenantloadareobserved,anewplacementiscomputedandtenantsaremigratedawayfromoverloadedservers.Whenusingastaticplacementstrategy,incontrast,allserversmusthaveenoughsparecapacitytohandleanes-timatedpeakloadoveralongertimeperiod.Inthefollow-ing,westudyhowmanyserversaretemporarilyoverloadedineachtenminuteintervalwhenanincrementalplacementstrategyisused.Here,atemporarilyoverloadedserverhasaloadbeyonditsloadcapacitylimitatthebeginningofatenminuteinterval,i.e.afternewvaluesfortheloadofthetenantshavebeenobservedandbeforeanewincrementalplacementiscomputedandputinplace.Thismetricisanindicatorforaplacement'srobustnesstowardsunexpectedloadspikes.Thefactthatserversaretemporarilyoverloadedwhiletheplacementisbeingre-organizedinresponsetoaloadspikeisperhapsthemostimportantdownsideofincre-mentalplacement.Managingthetrade-o betweentempo-raryoverloadsandcostforserverrentisakeychallenge.Figure5aprovidestwomaininsights.Firstly,temporaryoverloadsoccurmostlyinthemorningwhenpeoplecomeintowork.Thisisthetimewhenloadincreasesdrasticallybetweenadjacenttenminuteintervals.Secondly,tempo-raryoverloadsa ectalargefractionofallactiveservers.Thelatterisactuallypositive:duetointerleaving,excessloadisdistributedacrossmanyservers,whichavoidslocalhotspotsinthecluster.AscanbeseeninFigure5b,theaverageexcessloadonallserversismoderateformosttenminuteintervals.Thelargestcomponentoftheaverageex-cessloadontheserversisduetoheadroomthatisreservedforserverfailures(i.e.ourpenalty).Figure5cshowsthenetaverageoverloadacrossallservers,withoutincludingthepenaltyintheloadontheservers.Forourexemplarybusinessday,aslongasnoserverfailureoccursexactlyat7:20a.m.,temporarilyoverloadedserversactuallyneverex-ceedtheircapacitylimitbymorethan10%.Surprisingly,placementscomputedwiththesplitmerge-inc.algorithmareextremelyrobust:only30%oftheserversarebeyondtheirloadcapacitywhenconsideringpenalty,andnoserveratallisoverloadedwhenconsideringonlytheactualloadwithoutpenalty.Inthelattercasethereisnotasingletenminuteintervalwithanoverloadedserver.Al-thoughsplitmerge-inc.isclearlysuperiortoourotherheuris-ticsinthisregard,itshighrunningtimes(between1.5and5.5minutespertenminuteinterval)makeitsusemostlyimpracticable.Notealsothatsplitmerge-inc.ishardertoparallelizethanforexampletabu-inc.orportfolioduetoitscomplexmergephase.Basedonthisexperiment,itbecomesclearthatourheuristicsmustbeextendedtominimizetheimpactoftemporarilyoverloadedservers.WedevelopandevaluateappropriatetechniquesinSection5.3.Weconcludethatrobust\ft-inc.providesthebestbalancebetweenservercost,runningtime,androbustnesstowardstemporaryloadspikes.Theoverallmostrobustalgorithm,splitmerge-inc.,hasprohibitivelylongrunningtimes.There-fore,weevaluaterobust\ft-inc.inmoredetailinthefollow-ing.Duetospacerestrictions,weomitexperimentswithavaryingmigrationbudgetinthispaper.Itturnsoutthattabu-inc.hassomecharacteristicsthatmakeitfavor-ableoverrobust\ft-inc.forsmallmigrationbudgetsandthusshorterreorganizationintervals.However,reorganizationin- 781 0 20 40 60 80 100 07:00 08:00 09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 % servers overloadedTime (10 Minute Ticks)(a) % of overloaded servers (including penalty)robustfit splitmerge tabu-robustfit-l portfolio 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 07:00 08:00 09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 avg excess loadTime (10 Minute Ticks)(b) avg excess load per server (including penalty)robustfit splitmerge tabu-robustfit-l portfolio 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 07:00 08:00 09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 avg excess loadTime (10 Minute Ticks)(c) excess load per server (without penalty)robustfit splitmerge tabu-robustfit-l portfolio Figure5:Overloadedserversduringworkinghoursforselectedalgorithtervalsshorterthantenminutescouldresultin\thrashing"inthesenseofoverreactingtoshort-livedloadbursts.5.2AdvancedExperimentswithRobusttHavingestablishedthatimmensecostsavingscanbereal-izedusingincrementalplacementstrategies,anaturalnextquestionisbyhowmuchouralgorithmsdeviatefromplace-mentswhichareoptimalintermsofthenumberofrequiredservers.Wethereforeinvestigatelowerboundsonservercostinthissection.Wealsoconsidervaryingthenumberofreplic

aspertenantsbeyondtheminimum.Experiment(iv):LowerBoundsonOperationalCostInthefollowing,wecomparerobust\ft-inc.withtwosetsofcostbaselines.The\frstcomesfromsolvingthestaticvari-antofRTPwithourheuristics.ThesecondcomesfromrunningCplexonourMIPformulationforRTP,withthegoalof\fndingoptimalsolutions.Bothlowerboundsareonlyoftheoreticalinterestbecause(i)runningastatical-gorithmineachtenminuteintervalignoresmigrationcostsand(ii)weallowatimelimitofthreehourspertenminuteintervalforCplex.Table2containsdetails.Table2:Gapbetweenincrementalsolutionsandlowerbounds IncrementalLowerBoundGapin%avgmax robust\ft-inc.tabu-static-long863 robust\ft-inc.RTP-MIP1975TP-MIPRTP-MIP-lower-bound1742 Surprisingly,robust\ft-inc.performsalmostgoodastabu-tatic-long(thebeststaticheuristic)onaverage,eventhoughtheincrementalplacementproblemintuitivelyseemsmorechallengingthanthestaticone.Thelargemaximumgapisinfactduetoanoutlier:thesecondlargestgapis40%.Formorethan30outof144tenminuteintervals,robust\ft-inc.evenrequiresfewerserversthantabu-static-longorotherstaticalgorithms.Onereasonforthegoodperformanceofrobust\ft-inc.mightbethatitoftenhastheopportunitytostartfromagoodsolutionobtainedintheprevioustenminuteinterval.Incrementalimprovementsofgoodsolu-tionsarecarriedforwardbyrobust\ft-inc.HeuristicsforstaticRTPprovideanempiricallowerboundontherequirednumberofserversattainableinatimespanproportionaltotheproblemsize.However,exploringallcombinatorialoptionssystematicallymayleadtosolutionswithfewerservers.WethereforereportonsolvingMIPformulationsofRTPwithCplexnext.WeareinterestedinstudyingtherelativegapbetweenthesolutionsobtainedbyCplexandrobust\ft-inc.Unfortunately,thestandardproblemsizeusedinourexperimentsistoolargeforex-perimentationwithCplex.Wethereforeuseasmallersetoftenants(136tenants)onwhichwerunbothCplexandrobust\ft-inc.plexcanoftenimproveourheuristicsolutions,some-timesbyaconsiderableamount.Thelowaveragegap(seelowerpartofTable2)howeverclearlyspeaksforrobust\ft-inc.WhiletheresultsherehavebeenobtainedbasedontheMIPformulationforRTP,weobservedsimilarresultswhenrunningCplexonthestaticvariantwitha24hourtimelimitpertenminuteinterval.Basedontheseexperiments,weconjecturetherelativegapbetweenexactsolutionsandheuristicallyobtainedsolutionstobesimilarforlargerprob-lemsizes(asisthecasefortherelatedbin-packingprob-lem[11]).Cplexalsocomputeslowerboundsontheop-timalsolution(seelastlineofTable2).ThegapbetweenthebestCplexsolutionandthislowerbounddoesnotnec-essarilyindicatethatthebestCplexsolutioncanactuallybefurtherimproved,butratherthatthelowerboundsareweak.Thisisanothersimilaritytothebin-packingproblemwhereanassignmentformulationleadstolowerboundsrel-ativelyfarbelowtheactualoptimum[23].Weconcludethattheplacementsobtainedbyrobust\ft-inc.arecloseenoughtothetheoreticaloptimum,especiallyconsideringitsspeed.Experiment(v):VaryingtheNumberofTenantReplicasInallpreviousexperimentsthenumberofreplicasperten-antwassettotheminimumwith(12).Section4.2.1listedseveralapproachesfordynamicallycomputingthenumberofreplicas.Inthisexperiment,weevaluatethesimplestone:varyingthenumberofreplicaspertenantbyaddingano setbetweenoneand\fvetotheminimumnumberofreplicas.Figure6showsthatahigherreplicationfactorde-creasesthevarianceintheactivenumberofserversovertheday.Tenantsizebecomesthedominantresourcedimensionasthenumberofreplicasincreases,uptoapointwhereten-antloadisnolongerthelimitingfactor.Tooursurprisewefoundthatthemaximumnumberofserversrequiredduringpeakloaddecreasesdrasticallyastheo setincreases.Con-versely,duringtimesoflowload,ahigho setincreasesthenumberofactiveservers.ForourWednesday,ano setoffourisbestduringpeakloadandano setofzeroisbestduringtheperiodwhereloadisatitslowestlevel.Therearestagesinbetweenwhereo setsoftwoandthreedobest.Atpeakload,ano setof\fveresultsinthesmallestnumberofservers.Table3showsthatcostforserverrentdoesnotin-creasemonotonicallywithahigherreplicationo set.Infact,increasingtheo setfromzerotoonedecreasescostfrom 782 5 10 15 20 25 30 35 40 01:00 02:00 03:00 04:00 05:00 06:00 07:00 08:00 09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00 00:00 01:00 # ServersTime (10 Minute Ticks)offset = 0 offset = 1 offset = 2 offset = 3 offset = 4 offset = 5 Figure6:Numberofactiveserversonatypicaldayithavaryingreplicationfactor 0 500 1000 1500 2000 2500 0 10 20 30 40 50 60 70 80 90 Cost in USDNumber of Ten MinuteIntervals where Overloads Occur(a) inc. tenant load dec. server capacity replica offset 0--10 robustfit-s-interl. kairos-MIP 0 500 1000 1500 2000 2500 0 1 2 3 4 5 6 7 8 9 Cost in USDTotal Excess Load Across AllServers for the Worst Ten Minute Interval(b) inc. tenant load dec. server capacity replica offset 0--10 robustfit-s-interl kairos-MIP Figure7:Performanceofdi erentgenericover-provisioningrategiesforavoidingoverloadedservers$201to$187.Asapointofcomparison,runningportfolio|supposedlythebestincrementalheuristic|witha

no setofzeroaccountsforadailycostof$192(seeTable1).Table3:Dailyservercostwithvaryingo set O set12345Cost($)201186201237257289Servers(max)392827272831 Dynamicallyvaryingthenumberofreplicasoverthedaysapromisingavenueforfuturework.5.3GenericOver-ProvisioningStrategiesAsstatedinExperiment5.1,incrementalplacementre-quirestradingo costandrobustnesstowardsloadspikes.Inthefollowing,weconsidergenericmeasuresforreduc-ingthenumberoftemporarilyoverloadedservers,whichin-creasesrobustness.Wealsoinvestigatescenariosinwhichmorethanoneserverfails.Itturnsoutthatthebeststrat-egytoavoidoverloadedserversalsohelpswhendealingwithmultipleserverfailures.Twostrategiesimmediatelycometomindforreducingthenumberofoverloadedservers:(i)vir-tuallyincreasingtheloadofeachtenant,and(ii)increasingtheheadroomleftunusedoneachserver.Experiment5.2inspiresathirdstrategy:increasingthenumberofreplicas.Theintuitionbehindthisstrategyisthatahigherreplica-tionfactorcouldhelpsmoothingoutharshloadchanges.Experiment(vi):AvoidingOverloadedServersFigure7showshowthethreeover-provisioningschemesde-scribedabovein\ruencethetrade-o betweenoperatingcostandtheoccurrenceofoverloadedservers.Welimitourselvestoevaluatetheover-provisioningstrategieswithrobust\ft-inc.Forbothgraphs,wevarythestrengthoftherespectivestrategiesfromlefttorightbyincreasingtheheadroomontheserversinstepsof0.05andbyincreasingtheloadofthetenantsby5%ineachstep.Also,weincreasethenum-berofreplicaspertenantbyanincreasingo setwhengoingfromlefttoright.Increasinganyofthesethreeparametersresultsinmoreactiveserversandtheresultingplacementbecomesmoreexpensiveinturn.ThecostofplacementforourWednesdayisshownontheverticalaxis.Aspointsofreference,bothgraphsinFigure7alsoshowthestate-of-theartapproachesfromExperiment5.1asabaseline.robust\ft-s-interl.(darkbluearrows)producesnotemporaryoverloadsbecauseitstronglyover-provisions.kairos-MIP(lightbluearrows)producesmoretemporaryoverloadsbutislessexpensive.Anotherpointofreferenceinbothchartsisrobust\ft-inc.inthestandardcon\fgurationwithoutanyover-provisioningstrategy(pinkarrows).Figure7ashowsthenumberofoccurrencesofoverloadedserversacrossalltenminuteintervals.Figure7bshowsthesumofallexcessloadacrossallserversfortheworsttenminuteintervalofourWednesday.Thelattermet-ricisparticularlysensitive.Itsminimizationreducestheseverenessofoverloadsituationstoanegligiblelevel.Forbothgraphs,whenmovingfromlefttorightalongthex-axis,theresultingplacementsobviouslybecomemoreandmoreexpensive.Whenmerelycountinghowoftenserversareoverloadedacrossalltenminuteintervals,thestrategytodecreaseservercapacitiesconvergestowardsavalueofzerooverloadedserversfaster(andthusmoreinexpensively)thanthestrategythatvirtuallyincreasestenantload.Whenaddingupbyhowmuchtheserversareoverloaded,theop-positeisthecaseandthestrategythatvirtuallyincreasestenantloadconvergestowardszeroexcessloadfaster.How-ever,forbothmetrics,thestrategytoincreasethereplica-tiono setisclearlysuperiortotheothertwostrategies.Experiment(vii):MultipleServerFailures.RTPguaran-teesthatnoserverisoverloadedwhenanyoneotherserverintheclusterfails.However,sincetheover-provisioningstrategiesintroducedinthepreviousexperimentresultinplacementswhereservershavemore\headroom,"wearein-terestedinwhethersuchplacementscanalsohandlemul-tiplesimultaneousserverfailures.Westudytwometrics,(i)theamountbywhichotherserversareoverloadedasaconsequenceofoneormultiplesimultaneousfailures,and(ii)howmanytenantsarerenderedcompletelyunavailablewhenmultipleserversfailatthesametime.Wecollectthe\frstmetric,theexcessload,afterloadchangeshavebeenob-served,theplacementalgorithmhasrun,andallmigrationshavebeenperformed.ThisisincontrasttoExperiments5.1and5.3,whereexcessloadhasbeenmeasuredbeforetheplacementalgorithmruns(thefocuswasontherobustnesstowardsunanticipatedloadchanges).Also,weconsideronlyactualloadontheserverswithoutpenalty,sinceweinvesti-gatefailuresituationsinwhichtheserversaresupposedtouseuptheheadroomallottedintheformofpenalty.Weinjectfailuresintotheclustertwiceduringtheday(markedinFigure8usingarrows).Atthosepointsintime,wefaila\fxednumberofserversbetweenzeroandfour.Thefailingserversarechosenatrandom.Wecomparethestandardcasewherenomeasuresforover-provisioninghavebeenapplied(Figure8a)toanover-provisionedplace-mentusingthestrategythatvirtuallydecreasestheca-pacityofaserver(showninFigure8b).Weparameter- 783 Table4:Averagenumberofunavailabletenants Simultaneousfailures1234 Noover-provisioning0.000.001.755.5911.50 Over-Prov.Strategy:scaledbyfactor1.850.000.000.681.802.78capreducedto0.450.000.000.100.601.76)increasedby50.000.000.000.000.00 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 11:00 12:00 13:00 14:00 15:00 16:00 17:00 Excess Load (Total)Time (10 Minute Ticks)Failure 1Failure 2 1 simultaneous failure 2 simultaneous failures (a)standardcon\fguration 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.02 11:00 12:00 13:00 14:00 15:00 16:00 17:00 Excess Load (Total)Time (10 Minute Ticks)Failu

re 1Failure 2 3 simultaneous failures 4 simultaneous failures (b)over-provisionedigure8:Impactofmultiplesimultaneousfailuresizedthisover-provisioningstrategysuchthatitisinthesweet-spotbetweenthenumberofoverloadedserversandcost.Thecon\fgurationwepickedismarkedwithacircleinFigure7a.Weobservethatserversareoverloadedbyupto37%inthestandardcon\fgurationwithfoursimultaneousserverfailures(Figure8a).Fortheover-provisionedplace-ment(Figure8b),ameasurableimpactisonlyvisibleforthreeandfoursimultaneousfailures.Theseverityinthesecasesisapproximately20timeslowerthanwithoutover-provisioning.Table4showsthenumberoftenantsthataretemporarilyrenderedunavailablewheninjectingmultiplesimultaneousfailuresintothecluster.Wevarytherandomseedforchoos-ingtheserversthatfailandreportaveragevalues.Fromanavailabilitypoint-of-view,thereplication-basedstrategyistheclearwinner.Itisalsothecheapestamongthethreeover-provisionedcon\fgurations.WeconcludethatalthoughRTPguaranteesthatSLOsaremetforonlyasinglefailure,inpractice,multiplesimultaneousserverfailuresareoftennotproblematic.6.RELATEDWORKPreviousprojectshaveaddressedvariantsandsubprob-lemsofthetenantplacementproblem,buttoourknowl-edgenoneconsideredinterleavedtenantplacementtomini-mizetherequiredservers,whilstguaranteeingresponsetimeSLOsandtakingreal-worldworkloadtracesintoaccount.ResourceModeling.Determiningatenant'sresourcerequirementsentailscharacterizingthedominantresourcesandquantifyinghowmucheachtenantutilizesthem.ForMySQL,[5]presentsatechniqueforestimatingthecom-bineddiskI/Operformanceinthepresenceofmultitenancy.Thetenants'CPUandmemoryrequirementsadduplinearlyintheirmodel.In[18],tenantsaregroupedintoseveralSLOclasses.Alltenantsinaclasshavethesameresponsetimeguaranteesandthesamesize.Aserveris\flledwithamixoftenantsfromdi erentSLOclasses.Abinaryfunctionde-termineswhethertheservercanmeettheresponsetimere-quirementsofitstenants.[7]introducesalogicalI/OmetrictocharacterizethedominantresourcebottleneckforOLAPworkloadsinPostgreSQL.ThisI/Ometricaggregateslowerlevelmetricssuchasbu erpoolhitsandhitratesintheoperatingsystem's\flesystemcache,whicharedependentontheshareddiskaccessbehaviorofconcurrentqueries.In[21],thefocusisonin-memorydatabases,wherediskI/Oisnotthedominantresource.Instead,themainresourcesbeingconsumedareCPU,memory,andbandwidthbetweenCPUandmemory.Thissituationconsiderablysimpli\fesresourcemodelingincontrasttothedisk-basedapproaches.Notethatbesidesresourcemodeling,both[5]and[18]alsoprovidenon-linearprogramsfortenantplacement,buttheyonlyconsiderstaticplacementandthereisnonotionofin-terleavingorloadredistributionincaseoffailure.DeclusteringAlgorithms.Signi\fcantresearchhasbeendevotedtodeclusteringstrategiesforincreasingtheavail-abilityofparalleldatabase[16,14,19,24,26].Teradata'sinterleaveddeclusteringstrategyusesinterleaveddataplace-mentforfastrecovery,whereaschaineddeclustering[16]andadaptiveoverlappeddeclustering[24]aimatequallyredis-tributingworkintheclusterinthecaseofaserverfailure.Thisredistributionisdonebyupdatingtheloadbalancingpolicywhenanodefails,whichrequirescontrollingtheloadbalancingmechanismonpartitiongranularity.Alldeclus-teringstrategiesassumethatapartitioncanbesplitfurtherintosub-partitionsand,hence,distributedacrossservers.Thisassumptiondoesnotholdinourscenario,whereaten-antisconsideredanatomicunitandtenantsaresosmallthatthereisnobene\fttopartitioning.Furthermore,allthesestrategiesassumea\fxednumberofserversandrepli-cas,whereasourgoalistominimizethenumberofactiveserversateachpointintime.Toourknowledge,onlyMi-crosoftSQLAzure[1]usesinterleavedtenantplacement,butdoesnotdisclosedetailsonalgorithmdesignore ectiveness.GreedyAlgorithms.Variousgreedyplacementstrate-gieshavebeenproposed,noneofthemconsideringinter-leavedplacement.Forexample[25],usedasabaselineinExperiment5.1,usesagreedy\frst-\ftalgorithmafterob-servingthetenants'loadrequirements.In[20],agreedyin-crementalplacementalgorithmisproposedforadaptivedis-tributedmiddleware,whereas[3]providesagreedyheuristictoautomaticallyadjustthenumberofmachines.Neitheroftheseapproachesconsidersfailuresand/ormultiplereplicas.Notably,placementstrategiesforvirtualmachines[9,13]sharemanyaspectswithincrementaltenantplacement.Forinstance,AutoGlobe[13]usesatrace-basedapproachthatassessespermutationsandcombinationsofworkloadstode-termineanear-optimalworkloadplacementprovidingspe-ci\fcSLOs.Similarly,[9]usesalinearprogramandheuristicstocontrolVMmigration.Bothapproachesdonotconsiderreplicationtoincreaseavailabilityandperformance.MigrationTechniques.Varioussystemsstudiedproto-colsonhowtomostecientlymigratetenants:[6]presentslivemigrationforadecoupledstoragedatabaseapproach;[8]doesthesameinamoretraditionalmulti-tenantsetupaspresentedhere.Ouralgorithmscanbeusedwiththesetechniquesbyadaptingthemigrationoverheadfactors.Optimization.Finally,theoptimizationcommunityhasconsideredthebin-packingproblemfordecades[11,23].Manyvariations(andtheRTPisone)oftheproblemhavebeenstudiedovertheyears,e.g.[2,10,15

],however,wearenotawareofapproachesthattakerobustnesstowardsindi-vidualserverfailures,asweconsiderit,intoaccount.Itis 784 therobustness(orpenalty,cf.Constraint(5))thatrendersxistingbin-packingalgorithmsunusableforRTP.7.CONCLUSIONInthispaper,weintroducedRTPandpresentedavarietyofincrementaldataplacementalgorithmsformulti-tenantSaaS.Anevaluationwithreal-worlddatarevealedthatourapproachleadstosigni\fcantcostsavingsincomparisontothestateoftheart,whileadheringtoresponsetimeSLOscapturedinresourcemodels.Weextendedouralgorithmswithgenericstrategiesforover-provisioning,sothatadmin-istratorswhowishtoruntheirclusterwithmoreheadroomcan,atthesametime,bene\ftfromthecostsavingsthatcomewithincrementalplacement.Ourmostimportant\fnd-ingsarethat(i)robust\ft-inc.andtabu-robust\ft-l.\fndnearcost-optimalsolutionsinshortrunningtimes,(ii)ourover-provisioningstrategiesreducetheimpactofloadspikestoanegligiblelevelwhilemaskingmultiplesimultaneousserverfailuresfromtheperspectiveofresponsetimeSLOs;and(iii)theover-provisioningstrategybasedonincreasingthereplicationfactoristhewinneramongthepresentedap-proaches,fromacost,availabilityandclustersizingper-spective.Infuturework,wewillstudymechanismsfordynamicallyadjustingtheover-provisioningstrategiesovertheday(e.g.usingavaryingratherthana\fxedo setforthenumberofreplicas).Giventhesuccessofmachine-learningtechniquesinrelatedareas,anotheravenueforfutureworkisgoingfromare-activetoapro-activeplacementapproach,forexamplevialoadforecastingasapreprocessingstepforRTP.8.ACKNOWLEDGMENTSTheresearchofJanScha nerwassupportedbySAP.TheworkofFranklinandKraskawassupportedinpartbyNSFCISEExpeditionsawardCCF-1139158andDARPAXDataAwardFA8750-12-2-0331,andgiftsfromSAP,Ama-zonWebServices,Google,BlueGoji,Cisco,ClearstoryData,Cloudera,Ericsson,Facebook,GeneralElectric,Hor-tonworks,Huawei,Intel,Microsoft,NetApp,Oracle,Quan-ta,Samsung,Splunk,VMwareandYahoo!.TheauthorswouldliketothankMarcusKrug,ReanGrithandMarcPfetschforhelpfuldiscussions.TheauthorsfurtherthankSAPforprovidingthelogdatausedfortheexperimentsinthispaper.InmemoryofDeanJacobs.Ourco-author,colleague,mentorandfriendDr.DeanBernardJacobspassedawayonJanuary14th,2013followingashortsevereillness.AlinefromoneofhisfavoritegospelsongsbyRalphCarmichaelreads\Thereisaquietplace;farfromtherapidpace;whereGodcansoothemytroubledmind."WedeeplymissDean,hisenthusiasticspirit,andhistechnicalgenius,whileherestsinthisquietplace.9.REFERENCES[1]P.A.Bernsteinetal.,AdaptingmicrosoftSQLserverforcloudcomputing.ICDE2011[2]C.Chekurietal.,Onmulti-dimensionalpackingproblems.ACMSODA,1999[3]J.Chenetal.,AutonomicProvisioningofBackendDatabasesinDynamicContentWebServers.IEEEICAC2006[4]J.Csiriketal.,BoundedSpaceOn-LineBinPacking:BestisBetterthanFirst.ACMSODA1991[5]C.Curinoetal.,Workload-awaredatabasemonitoringandconsolidation.SIGMOD,313{324,ACM,2011[6]S.Dasetal.,Albatross:LightweightElasticityinSharedStorageDatabasesfortheCloudusingLiveDataMigration.PVLDB,4(8):494,2011[7]J.Dugganetal.,Performancepredictionforconcurrentdatabaseworkloads.SIGMOD2011[8]A.J.Elmoreetal.,Zephyr:livemigrationinsharednothingdatabasesforelasticcloudplatforms.SIGMOD,301{312,ACM,2011[9]T.C.Ferretoetal.,Serverconsolidationwithmigrationcontrolforvirtualizeddatacenters.FutureGenerationComp.Syst.,27(8):1027,2011[10]FrenchOperationalResearchandDecisionSupportSociety,ROADEFChallenge2012.http://challenge.roadef.org[11]M.R.Gareyetal.,ComputersandIntractability:AGuidetotheTheoryofNP-Completeness.1979[12]F.Glover,TabuSearch-PartI.INFORMSJournalonComputing,1(3):190,1989[13]D.Gmachetal.,Anintegratedapproachtoresourcepoolmanagement:Policies,eciencyandqualitymetrics.DSN2008[14]L.Hedegardetal.,Thebene\ftsofenablingfallbackintheactivedatawarehouse,Teradata.2007[15]D.S.Hochbaumetal.,APolynomialApproximationSchemeforSchedulingonUniformProcessors:UsingtheDualApproximationApproach.SIAMJ.Comput.,17(3):539,1988[16]H.-I.Hsiaoetal.,ChainedDeclustering:ANewAvailabilityStrategyforMultiprocessorDatabaseMachines.ICDE1990[17]J.Krugeretal.,FastUpdatesonRead-OptimizedDatabasesUsingMulti-CoreCPUs.PVLDB,5(1):61[18]W.Langetal.,TowardsMulti-tenantPerformanceSLOs.ICDE2012[19]M.Mehtaetal.,DataPlacementinShared-NothingParallelDatabaseSystems.VLDBJ.,6(1):53,1997[20]J.M.Milan-Francoetal.,AdaptiveMiddlewareforDataReplication.ACM/IFIP/USENIXInternationalMiddlewareConference2004[21]J.Scha neretal.,Predictingin-memorydatabaseperformanceforautomatingclustermanagementtasks.ICDE2011[22]J.Scha neretal.,RealisticTenantTracesforEnterpriseDBaaS.SMDB,ICDEWorkshops,2013[23]J.M.ValeriodeCarvalho,Exactsolutionofbin-packingproblemsusingcolumngenerationandbranch-and-bound.AnnalsofOper.Research,1999[24]A.Watanabeetal.,AdaptiveOverlappedDeclustering:AHighlyAvailableData-PlacementMethodBalancingAccessLoadandSpaceUtilization.ICDE2005[25]F.Yangetal.,AScalableDataPlatformforaLargeNumberofSmallApplications.CIDR,2009[26]H.Zhuetal.,Shifteddeclustering:aplacement-ideallayoutschemeformulti-wayreplicationstoragearchitecture.

Related Contents


Next Show more