120K - views

Leveraging Stored Energy for Handling Power Emergencies in Aggressively Provisioned Datacenters Sriram Govindan Datacenter Compute Infrastructure Team Microsoft Redmond WA USA srgovinmicrosoft

com Di Wang Anand Sivasubramaniam Bhuvan Urgaonkar Department of Computer Science and Engineering The Pennsylvania State University PA 16802 diw5108anandbhuvan csepsuedu Abstract Datacenters spend 1025 per watt in provisioning their power infrastruct

Embed :
Pdf Download Link

Download Pdf - The PPT/PDF document "Leveraging Stored Energy for Handling Po..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Leveraging Stored Energy for Handling Power Emergencies in Aggressively Provisioned Datacenters Sriram Govindan Datacenter Compute Infrastructure Team Microsoft Redmond WA USA srgovinmicrosoft






Presentation on theme: "Leveraging Stored Energy for Handling Power Emergencies in Aggressively Provisioned Datacenters Sriram Govindan Datacenter Compute Infrastructure Team Microsoft Redmond WA USA srgovinmicrosoft"— Presentation transcript:

LeveragingStoredEnergyforHandlingPowerEmergenciesinAggressivelyProvisionedDatacentersSriramGovindanDatacenterComputeInfrastructureTeam,Microsoft,Redmond,WA,USAsrgovin@microsoft.comDiWangAnandSivasubramaniamBhuvanUrgaonkarDepartmentofComputerScienceandEngineering,ThePennsylvaniaStateUniversity,PA16802diw5108,anand,bhuvan@cse.psu.eduAbstractDatacentersspend$10-25perwattinprovisioningtheirpowerinfrastructure,regardlessofthewattsactuallyconsumed.Sincepeakpowerneedsariserarely,provisioningpowerinfrastructureforthemcanbeexpensive.Onecan,thus,aggressivelyunder-provisioninfrastructureassumingthatsimultaneouspeakdrawacrossallequipmentwillhappenrarely.Theresultingnon-zeroprobabilityofemergencyeventswherepowerneedsexceedprovisionedcapacity,howeversmall,mandatesgracefulreactionmechanismstocapthepowerdrawinsteadofleavingittodisruptivecircuitbreakers/fuses.Existingstrategiesforpowercappingusetemporalknobslocaltoaserverthatthrottletherateofexecution(usingpowermodes),and/orspatialknobsthatredirect/migrateexcessloadtoregionsofthedatacenterwithmorepowerheadroom.Weshowthesemecha-nismstohaveperformancedegradingramications,andproposeanentirelyorthogonalsolutionthatleveragesexistingUPSbatteriestotemporarilyaugmenttheutilitysupplyduringemergencies.Webuildanexperimentalprototypetodemonstratesuchpowercap-pingonaclusterof8servers,eachwithanindividualbattery,andimplementseveralonlineheuristicsinthecontextofdifferentdata-centerworkloadstoevaluatetheireffectivenessinhandlingpoweremergencies.Weshowthat:(i)ourbattery-basedsolutioncanhan-dleemergenciesofshortdurationonitsown,(ii)supplementexist-ingreactionmechanismstoenhancetheirefcacyforlongeremer-gencies,and(iii)batteryevenprovidefeasibleoptionswhenotherknobsdonotsufce.CategoriesandSubjectDescriptorsC.0[ComputerSystemsOr-ganization]:GeneralGeneralTermsDesign,Experimentation,Measurement,Perfor-manceKeywordsUPS,Batteries,Datacenter,Peakpower,Storeden-ergy,Provisioning,Cap-ex,Peakshaving1.IntroductionDatacentersincurcapitalexpenditure(cap-ex)of$10-25perwattofprovisionedpowercapacity,regardlessofwhetherthiswattisac-tuallyconsumed[4,23].ThepowerdeliveryinfrastructureamountsPermissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprotorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationontherstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecicpermissionand/orafee.ASPLOS'12,March37,2012,London,England,UK.Copyright2012ACM978-1-4503-0759-8/12/03...$10.00 0 0.2 0.4 0.6 0.8 1 0.4 0.5 0.6 0.7 0.8 0.9 1 CDFNormalized Power (wrt peak) Figure1.RackPowerproleofaGoogledatacenter[12].toseveralmilliondollarsincap-ex[5],contributingoverathirdoftheamortizedmonthlydatacentercosts[4].AlthoughnotallofthismaygotowardsthepowerdeliveryforITequipment,sincethereisothersupportinginfrastructureincludingHVACs,fans,etc.,theIT-relatedpowernetworkisstillasubstantialcomponentthatwarrantsmeticulousplanningandprovisioning.Provisioningforthetheo-reticalpeak(usingface-plateratingsofequipment)thatmayneverhappen,oreventheoccasionalpeak(whichrequiresallequipmenttosimultaneouslyexercisetheirmaximumdraw),canproveveryexpensive.AstudyofpowerconsumptioninaGoogledatacen-ter,depictedinFigure1,reiteratesthisobservationshowingaverylowprobabilityofreachingclosetotheprovisionedpeak(prob-abilityofexceeding90%ofthepotentialpeakislessthan1%),withaprolethatishighlyskewedaroundtheaverage.Aggressiveprovisioningofpowerinfrastructurecanthusyieldsubstantialcap-exsavings.Thegoalofthispaperistodealwiththerare(blackswan)poweremergencies(whenthedrawexceedsprovisionedpower)thatariseinsuchaggressivelyprovisioneddatacentersinaseamlessandgracefulfashion.Thetemporalandspatialloadvariations,andresultingpowerdraw,inadatacenterofferastrongreasonforleveragingstatisticalmultiplexingtounder-provisionthepowerinfrastructure.Circuit-breakersareusedtoensureequipmentsafetyandhandlepowerspikeslastingseveralseconds[9],abruptlycuttingoffpowersup-plyuponanoverdraw.Canweemployalternatesolutionsthathan-dleemergenciesmoregracefullybyremoving/easingsuchspikes,therebyreducingtheprobabilityofhittingthesesafetylimits?Circuit-breakerswouldcontinuetobeused,butonlyasthelastlineofdefense.Also,datacentershavepowerupgradecyclestohandlegrowingITload.Theprobabilityofemergenciescanin-creaseaswegetclosertothenextupgrade,andsuchsolutionscanhelprelievedisruptionsduringtheseperiods.Gracefullyhandling emergencies,whilemeetingperformanceSLAs,wouldallowmoreaggressivepowerprovisioning1.Softwaremechanismsforpowercappinghavebeenexaminedtosomedepth[12,13,16,19,25,37,38,47].Broadly,wecanclas-sifyexistingsolutionsintotwocategories.Therstsetofknobsarelocaltoservers,wheretemporalload-shifting/demand-shapingisusedtocontroltherateofworkloadexecution(andtherebycon-trolpowerconsumptionwhichgrowswithutilization).Apartfromschedulingtheload(temporallyspacingitoutusingCPUschedul-ing),hardwarepowermodecontrol-clockthrottlingstates(in-cludingservershutdown[7,36])and/ordynamicvoltage-frequencymodulationstates(DVFS)-isalsousedtocontroltherateofwork-loadexecution(withdifferenttrade-offsbetweenperformanceandpower)[16,47].Thesecondsetofknobsexploitsspatialnon-uniformityinpowerprolesacross(groupsof)servers,anddy-namicallyredirects/migratesloadtoregionsofthedatacenterwithmoreheadroom.Whilelocalknobsareagileandeffectiveforhan-dlingshort-livedemergencies,prolongedthrottlingcanimpactper-formance.Whilespatialmigrationisbetterforlongeremergencies,therecanbeoverheadsduringandafter(loadimbalanceonsomenodes,lossoflocality,etc.)migration,inadditiontotheneedforheadroomelsewhereinthedatacenter.Theloadmayalsonotbemove-abletootherserversinsomecases.Regardless,boththesestrategiescanhavesubstantialperformancerepercussions.Thisproblemisanalogoustosupply-demandmismatchissuesontheelectricalgrid,andthesolutionsaboveareanalogoustodemand-responsemechanismsinthatcontext.However,oneso-lution-energystorage-thatisusedinnormalgridstoamplifytheefcacyofdemand-responsehasbeenlittleexploredinthedatacen-tercontext.Energystoragecanbeusedto(i)dealwithshort/smallpowerspikeswithoutevenrequiringotherreactionmechanisms(temporal/spatial)whichhaveperformanceconsequences,(ii)sup-plementexistingmechanismstoimprovetheirefcacyinmeetingapplicationSLAs,and(iii)offerremedialsolutions,eveniftheyaretemporary,whenothermechanismsmaynotndfeasibleoptionstomeetapplicationSLAs.Further,unlikeingridswhereenergystor-agemaybeacostlyproposition,datacentersalreadyhavestoragebuilt-inintheformofUPSunitstohandlepowerdisruptions-canwetapintotheseforemergencyhandlingwithoutimpingingontheavailabilitymandates?TherearedifferentchoicesforUPSplacement-fromacentral-ized(usuallyredundant)conguration,toadistributedversionateachserverasinGoogledatacenters[18],aswellasatintermedi-atelevels(e.g.,per-rack)similartotheonesinMicrosoft[33]andFacebook[11]datacenters.Server-levelUPS,whenincorporatedwiththepowersupply,canhelpeliminatedouble-conversionneedsandtheassociatedenergylosses.Weassumeaserver-levelUPSunit,althoughmanyofourideaswillalsoapplytoothercongu-rations.Ingeneral,UPSbatterieshelpreduce/cappowerdrawatandabovethelevelinthepowerhierarchywheretheyareplaced.Whenusingbatteriesbeyondtheiroriginalrole,wemustensurethatthedatacenter'spoweravailability(uponutilityoutages),aswellasthebatteries'normallifetime(3-5yearsforlead-acid)arenotcompromised.SinceUPSunitsaremainlytransitionarydevicestotemporarilyhandleloaduntildieselgeneratorsarestarted(whichtakesaround15-30seconds),weareabletoutilizethemforacou-pleofminutesforourpurposeswhilestillleavingenoughchargeforavailabilitypurposes.Tofacilitateaggressivepowerprovision-ing,thispapermakesthefollowingcontributions: 1AsTalebsaysinhisbook[42]:BlackSwansbeingunpredictable,weneedtoadjusttotheirexistenceratherthannaivelytrytopredictthem.Evenifthedatacenterpowerprolemayfollowbell-curvebehaviorasinFigure1,theextremeconsequencesofignoringthehighlyimprobabletailmakesemergencyhandlingacriticalproblem.Wepresentanofinetheoreticalframeworkcombiningallknobs-batteries,powerstates,migration(bothwithinandacrossdat-acenterclusters)-tondtheperformanceoptimalwayofhan-dlinganemergencyforagivenlevelofpowerunderprovision-ing.Thisservesasareferencepointforhowwellwecandoandoffersinsightsonwhatknobs/combinationsworkforgivenworkload-emergencycombinations.Sinceemergenciesareinherentlyunpredictable(when?howse-vere?howlong?),wedevelopseveralon-lineheuristicsem-ployingexistingtemporalandspatialknobs,andcombinationsthereof.Wealsointroducebattery-basedemergencyhandlingtechniques,accommodatingdifferentdrainrates(slowandfast),andcombinethemwithexistingknobsininterestingways.Wedevelopanexperimentalprototypeof8serverswithindivid-ualUPSunits,andimplementsevencontrolheuristics.Weex-amineemergenciesinthecontextofseveralrepresentativedat-acenterworkloadswithuniquecharacteristics:(i)TPC-W[40]andSpecjbb[41]server-basedworkloadswithemergenciesaris-ingfromloadspikes(e.g.,ashcrowds),(ii)aMapReduce[22]applicationwherethemapphaseintroducespowerspikessub-stantiallyhigherthanthereducephase,(iii)astreamingmediaserverwherehandlingthepowerspikecausedbyasurgeofnewconnectionsleadstojittersinexistingstreams,and(iv)amulti-programmedGPU+Virusscanapplicationpairthatistiedtoaparticularsetofservers(migrationisnotanoption)whichwhenruntogetherintroducesahighpowerdraw(requiringthemtobetemporallyspacedout).Weconsiderdifferentdegreesofunder-provisioning,andshowthatbatterybyitselfcansustainshortemergencies(around10-20minutesonourprototype).Longeremergencies(30minutesorhigher),requiremigrationsincesustainedlocalthrottlinghurtsperformance.Ontheotherhand,performanceoverheadsofmi-grationmakesitlessattractiveforshorteremergencies.Ourbat-terysolution,inconjunctionwithmigration,providesaseamlessbridgeacrossthespectrumoftheseduration,andisabletore-ducetheperformanceimpactoftheseintermediateduration.Inmostcases,battery-basedheuristics,withtheseknobs,helpusgetwithin10-20%ofthetheoreticalbounds.2.RelatedWorkTighterPowerProvisioning:Statisticalmultiplexing-basedover-booking/underprovisioningofresources,awidelyusedyieldman-agementtechniqueindifferentdomains,includingITresourcesindatacenters[43,46],isnowbeingsuggestedforthepowerinfras-tructure.Thisrangesfromcomponentswithinservers[13,25],togroups/ensemblesofserversandotherequipment[19,26,35,39,48].Thebasicunderlyingideaistoexploitthelowlikelihoodofsimultaneouspeakpowerneedsofallcomponents/servers[12,13,25].Suchprovisioningrequiresagilereactivetechniquestoensurethepowerconsumptionstaysbelowcapacitytoallowsafeopera-tionwithsatisfactoryworkloadperformance.Thesetechniquesin-cludedevicepowerstatecontrol[10,14,16,32,37,38,47,49]andworkloadschedulingormigrationwithinorevenoutsidethedatacenter[2,17,19,30,34,45].Battery-basedpowermanagement:Whilebatterymanagementhasbeenstudiedinthemobile/embeddeddomains[15,50,51](e.g.,drainrateadjustmentforlongevity),theiruseindatacentershasbeenlimitedtomeretransitiondevicesduringutilityfailure.RecentworkhaslookedatusingUPSbatteriestoreduceelectricityoperationalcosts[20,44],buttheirroleintighterprovisioningforreducingcap-exisentirelynovel.Toourknowledge,wearethersttoexploretheuseofstoredenergy(UPSbatteries)fortighterprovisioning. 3.ProblemDetailsandSolutions3.1TheProblemUnderprovisioningofpower(equipment)doesnothavetobere-strictedtothehighestlevelinthehierarchy.Asweaggressivelypushsuchunderprovisioningtoequipmentatlowerlevels,thereisscopeforadditionalcap-exsavings.However,goingdeeperlessensthepotentialforexploitingstatisticalmultiplexingofpowerdemandsacrossunderlyingservers.Thehigherburstiness(vari-ance)atdeeperlevelscanresultinhigherprobabilityforemergen-cies/violations.Amorerigorouscost-benetanalysisisnecessarytondouthowmuchtounderprovisionateachlevelofthehierar-chy.Weconsiderthisbeyondthescopeofthispaper,andfocushereonthefollowingconsequentproblem:givenalevelcaggressivelyunderprovisionedatPbudgettoaccommodateNserversunderit,howdowecapthenetpowerdrawoftheseserverstoPbudgetatalltimes?WewillconsiderdifferentvaluesofPbudgetrelativetothemaximumpossiblepeakdrawbytheNservers,henceforthreferredtoasacluster(c).IfPj(t)representspowerdrawofserverjattimet,thenunderprovisioningpowercapacityexploitsthestatisticalpropertyProbability(Nj=1Pj(t)�Pbudget);8t.Justasinotherdomainswhereunderprovisioningisused,thereareconsequencestobecarefullytraded-offagainstitsbenets.Sinceisnon-zero,despitebeingsmall,therewouldinevitablybesituationswhentheaggregatepowerdrawoftheNserverscannotbesafelyaccommo-datedbytheinfrastructureatlevelc.Sincemaynotbeidentiedprecisely(Section1),weneedreactivemechanismstodealwiththepoweroverdrawsituations.Normally,duringsuchanepisodeofpoweroverdraw,circuit-breakers/fusesatlevelcwouldkickintoensuresafetyfromrehazardsandoverdraw-inducedequip-mentburnout.Suchworkload-obliviousreactionmechanismsarehighlyundesirableinthedatacentercontextsincetheycanleadtolostcomputationandinconsistentstatesorevenimpactITequip-mentreliability(e.g.,harddiskfailures[52]).Therefore,whenun-derprovisioning,itisessentialtoadditionallyemployworkload-awarereactionmechanismstooperategracefullyundersuchemer-gencies,whilecontinuingtoleavecircuit-breakers/fusesasthelastlineofdefense.Suchmechanismsshouldalsobeagiletoquicklycontrolthepowerdraw.Fortunately,existingworkhasshownthatsuchagilepowercappingcanberealizedwellwithinthetimelim-its(typicallysub-secondtoafewseconds)imposedbycircuit-breakers[19,25,48].3.2CurrentSolutionStrategiesWeclassifyexistingworkload-awarereactionmechanismsintotwobroadcategories:temporalandspatial.Temporalmechanismsad-justtherateofresourceusagewithinaserver.Sincetherateofresourceusage(particularlytheCPU)affectspowerconsumption,itcanbeusedasaknobfordemandresponse.Twocommontem-poralmechanismsinclude(i)schedulingtodefersomeofthepeakloadtoanon-emergencyperiod,and(ii)employingpowerstates-dynamicvoltageandfrequencyscaling(DVFS),clockthrottling,andevensleep/shutdownstates.Spatialmechanismsdirect/migrateexcessloadtoserversand/orregionsofthedatacenterthathaveheadroomintheirpowerbud-gets.Hereagain,weconsidertwobroadstrategies:(i)loadredi-rection/migration/consolidationtooneormoreserverswithinthehierarchyatlevelc,allowingsomeserversinctobeshut/sloweddown.(ii)loadmovement/migrationtooneormoreserverselse-whereinthedatacenter(outsidec),withheadroomforthein-creasedload.Manyexistingtechniques[3,6,27,36]onloadbal-ancing/unbalancingforenergy/powerreductionfallintothiscate-gory.Suchloadmovementcouldbeachievedby(a)requestredi-rectionasinmanynetwork-basedservices[36],(b)fault-tolerantapplicationsthatdetectserverunavailabilityandautomaticallyre-balancethemselvesamongsttheavailableservers[22],and(c)dy-namicprocess/VMmigration[8].Allthesemechanismscanhaveperformanceconsequenceswithdifferentprosandconsintheoptionsthattheyoffer.Temporalmechanismsarelocaltoaserver,anddonotrequireheadroomelsewhereinthedatacenter.Thesemechanismsarealsoquiteag-ile,sincepowermodecontrolandschedulingdecisionscanbeperformedatanetimegranularitywiththeeffectsmaterializingquickly.Thedownsidetothesemechanismsistheirperformancedegradingeffect,especiallywhentheapplicationsdonotprovidesufcientslackintheirofferedloadandSLAspecications.Fi-nally,sincetheloadisnotmoved/migratedelsewhere,thedura-tionofemergencyisdictatedbytheapplicationload,makingthesemechanismslessdesirableforhandlinglongdurationemergencies.Temporaldampeningmechanismshavetobecontinuouslyappliedaslongasthehighloadpersists.Ontheotherhand,spatialtechniquescanworkwellwithlongperiodsofhighloadsincetheycanmovedemandstoregionsofthedatacenterwithsufcientheadroom.However,thesetechniqueshavetheirlimitations:(i)theyarelessagilethantemporalmecha-nisms,implyingtheirinadequacyinhandlingemergenciesontheirown.Reactingtoanemergencycantakealongtime,duringwhichatemporalmechanismmustbereliedupontocapthepowerdraw,makingthemonlysuitableforlong-durationemergencies;(ii)mi-grationcanbeexpensive,dependingontheapplicationstatethatneedstobemigrated.Thiscannotonlyimpactperformancedur-ingmigration(whichcantakeafewminutes)andaftermigration(duetolossoflocality),butcanalsotemporarilyincreasepowerdrawofconcernedservers;(iii)theyrequireheadroomelsewhereinthedatacenterwhichmaynotalwaysbeavailable(andsometimesundesirableduetoadministrativeboundaries);(iv)notallapplica-tionsareamenabletomigration,sincetheymayrequireresources(agraphicsapplicationrequiringaGPUcard,virusscanrequiringthelocaldisk),onspecicservers.Ingeneral,acombinationofthesetechniquescanbeusedde-pendingonthedurationofemergencyandSLAsofinvolvedwork-loads.Further,thesedecisionsmaynotnecessarilybestatic(i.e.,determinedcompletelyatthebeginningoftheemergency),withpossiblyanonlinealgorithmthatstartswithatemporalmechanismwhichoptimisticallylooksforapurelylocalsolution,andthenadaptivelymigratesloadifneeded.Regardless,allsuchalgorithmshaveperformancedegradingconsequenceseitherbythrottlingus-ingpowerstates/shutdown,and/orduetotheoverheadsofloadmi-gration(duringthemovement,lossoflocalityafterthemovement,etc.).3.3OurProposal:EmployingBatteriesWenowpresentanentirelyorthogonalandnovelsolutionofus-ingbattery-basedenergystorageforhandlingemergenciesthatcanavoid/reducetheperformanceconsequencesofexistingtechniques.Batteriesareanalogoustobuffersthatareusedinnetworkstosmoothenoutspikesandalleviatemismatchesbetweenworkloaddemandandcapacityavailability.Ratherthanadditionalbatteries,weproposetouseUPSbatteriesthatarealreadypresentforhan-dlingpoweroutages.Whilemostcurrentdatacentersemploycen-tralizedUPS,weconsideradistributedserver-levelUPS,similartothatinGoogledatacenters[18].Ourideasarealsoapplicabletoasharedcluster-levelUPS(withNtimesthecapacityofaserver-levelUPS),whereweonlyneedtoenforcehowmuchdrawcomesfromutilityversusfromtheseUPSes.Evaluatingsuchacongura-tionispartofourfuturework.BatteryCapacity:WedenotebytbthedurationforwhichaUPScansustaintheassociatedserver'speakpowerneeds.Typically,provisionedUPSbatterycapacitiesarefortbranginguptoafew minutes.TherecanalsoberedundancyintheUPSunitstoaccom-modateUPSfailures,implyinghighergrossbatterycapacity.Inad-ditiontoavailabilitycriteria,thesecapacitiesarealsodeterminedbythediscreteunitsofcapacitiesthatdifferentvendorsprovide.Forinstance,manyAPCbatteryofferingsareindiscretecapacitiesof4,8,16,24minutes,allratedatacertainpeakpowerdraw.Inoursetup,weconsideraUPSwithtb=4minutes,whichisinlinewith(infact,atthelow-endof)existingcapacities.Therearecost-benettrade-offswithinvestinginhigherbatterycapacitiesforadditionalcap-exsavingsachievedbyaggressivepowerinfrastruc-tureunderprovisioning.Apreliminarystudysuggeststhatthecostsofextrabatterycapacityisworthwhile(seesection7.6),thoughadetailedstudyofsucheconomicsisbeyondthescopeofthispaper.SourcingPower:WhileaservercouldpotentiallydrawcurrentsimultaneouslyfromtheUPSandthepowerline(onesourceforeachofitsdualpowersupplies),wedonotconsiderthisoptionsinceourexperimentalplatformdoesnotallowthis.Consequently,atanytime,aservereitherdrawsallofitscurrentfromitsUPSorfromthepowerline.Itispossibletomeetapowerbudgetoveranextendedperiodoftime(beyondwhatasinglebatteryoffers),sinceserverscouldtaketurnssourcingpowerfromtheirbatteriesatdifferenttimes.E.g.,ifreducingthepowerdrawof1serverusingitsUPSmeetstheoverallpowerbudgetfor2minutes,thenaclusterof8serverscanstaywithinthepowerbudgetforupto16minutes.BatteryRuntime:Batteries(includinglead-acidusedinmostUPSunits)arecharacterizedbytheirruntimechartasapproximatedbyPeukert'sLaw[28],whichshowsthetimetodrainacertaincapacityfordifferentpowerdraws.Runtimeoftb=4minutesisatthemaximumpowerdrawofaserver,andthedurationofpossiblebatterydrawismuchhigherforlowerpowerdraws.Forinstance,adrawof50%ofthepeak,allowsthebatterytolast11minutes.Wewillexploitthispropertytoforcealowerdrawforlongerbatteryoperation.AvailabilityandLifetimeConcerns:SinceweareusingUPSbat-terybeyonditsnormalpurpose(handlingpoweroutages),weneedtoensurethatwedonotcompromiseondatacenteravailability.Inrecentwork[21],wehavemodeleddatacenteravailabilityasafunctionofUPSbatterycapacitiesfordifferentplacementstrate-giesacrossthelayersofthepowerdeliveryhierarchy.Acrossthesestrategies,wehaveshownthatleavinga2minuteresidualcapac-itysufcestoensureahighavailabilityofuptovenines.Con-sequently,inallourexperimentsinthispaper,wealwaysleave2minutesofreservecapacity(tooperateatfullload)ineachUPSbattery.Anotherconcernisthelifetime(reliability)ofthebatteryitself:thenormallead-acidbatterytypicallylastsforabout3-5years[31].Sincecharge-dischargecyclescanreduceitseffectivelifetime,wehavetoensurethatourapproachdoesnotresultinreplacementofbatteriessoonerthanitsexpectedlifetime.Weleverageofourrecentwork[20]toaddressthisconcern,wherewehaveconductedanextensiveanalysisoftheimpactofbatterydischargescyclesonitslifetime.Usingthisanalysis,wendthatonecanhandletheserelativelyrarepoweremergencieswellwithintheexpectedbatterylifetime.EfciencyandAdditionalPowerofBatteries:Eachcharge-dischargecyclehasenergyloss,whichwendexperimentallytobe28%oftheoverallenergydrawnintheworstcase.Sincesuchlossisrestrictedtorareemergencies,wedonotexpectittosignif-icantlyimpactoperationalcosts.However,batterychargingitselfaddstoinstantaneouspowerdrawandthismustbeconsideredforadheringtothebudget.Therearedifferentoptionsfordealingwiththisissue-restrictingchargingtonon-emergencyperiods,slow-chargingsothattheinstantaneousdrawisnotsignicant,and/orcompensatingwithaggressivespatio-temporalworkloadthrottlingmechanisms.Inthiswork,weemploytherststrategy.4.HandlingEmergencies:AnOfineTheoreticalFrameworkHavingdiscussedtheprosandconsofvariousstrategies,wenowturnourattentiontocombiningtheirbestfeatures.Ismigration(spatial)ofoneormoreworkloadsontheNserversundercevenanoption?Therearedifferentsituationswhereaspa-tialknobmaynotbeapplicable,evenifworkloadshavesufcientslackintheirSLAs.First,theremustbeheadroomelsewhereinthedatacentertoaccommodatethemigratedload,whichmaynothappeniftheentiredatacenterishighlyutilized.Second,certainworkloadsmaynotbeamenabletomigration:consideraworkloadthatneedsaresourceonlyavailablelocally,e.g.,agraphicsappli-cationrequiringaGPUcard.Evenifmigrationisanoption,canlocal(temporal+bat-tery)knobsalonehandletheemergencywhilemeetingapplicationSLAs?Ifyes,thereisnoreasontolookforheadroomelsewhereinthedatacenterandincuradditionalmigrationcosts.Theagilityoftemporalknobsmakesthemmoreattractiveoverspatialknobsinthiscase.Whenlocal(temporal+battery)knobsdonotsufcebecauseoftheemergencydurationandstringencyofapplicationSLAs,whichworkloadsshouldwemigrateandatwhattime?ThesedecisionsshouldbedeterminedbasedonwhatimpactstheapplicationSLAstheleastwhilemeetingthepowerbudget.Wedevelopasimpleframeworktoconductsuchdecision-making.Letusdenotetheremainingtimeforanapplicationtocomplete2,atthebeginningofanemergency,ast.Lettmdenotethetime(fromtheemergencybeginning)whentheapplicationismigrated,ifatall.tmwouldbeafunctionofmigrationoverhead:tm=0impliesimmediatemigration(forrelativelystatelessapplicationswithlittlemigrationoverhead),andalargertmimplieslocalknobswillbeemployedasmuchaspossibleuntiltheybecomeinfeasibleandmigrationisnecessary.Iftmislargerthanemergencyduration,thenitimpliesthatlocalknobssufcetohandletheemergencywithoutrequiringanymigration.Todeterminetm,wedenotetherelativespeedatwhichtheapplicationrunsonthedestinationserver(s)withrespecttothesourceserver(s)asr(theslowdown).Weassumethatthereissufcientpowercapacityatthedestination,andtheslowdowncanbeapproximatedusingasimpleslowdownfactorr.However,theapplicationrunsuntiltimetmlocallybeforemigration,andwouldhavebeensubjectedtolocaltemporalknobswhichwouldhavealsosloweditdown.Thisslowdown,isdependentontmitself(i.e.,localknobsaffectexecutiontime)andwedenoteitasl(tm).Sincettml(tm)istheremainingtimefortheapplicationatmigrationtimetm,itsremainingexecutiontimeontheremoteserver(s)needstobescaledas(ttml(tm))r.Theresultingtotalexecutiontimeoftheapplicationwithmigration(asdepictedinFigure2),whichneedstobeminimized,cannowbeexpressedastm=tfmintm=0(tm+(ttml(tm))r)Therelativeimpactoftheslowdownwithlocal(temporal+bat-tery)andmigration(spatial)knobsiscapturedbyl(tm)andr(bothliebetween0and1).Ofthesetwo,rismainlyapplicationgovernedbyitslocalityproperties,sinceweassumethedestinationhasenoughheadroomtorunitwithoutanypower-relateddampen-ing.However,liscruciallydependentontm,andhowthelocal 2Evenforapplicationswithoutanexplicitnotionofremainingtime(e.g.,aWebserverrunningforever),anequivalentframework,saybasedonexecutionrate,canbedeveloped.   \n \r  \r   Figure2.Timelineabovefordefault,andbelowforemergencyhandlingwithlocalandspatialknobs.Migrationhappensattmin-stantaneouslyinthisillustration.Notetheelongationinexecutiontimeduetolandr.Resultinglowerpowerdrawfortheemer-gencybetween0andt,comparedtodefaultisalsoshown.knobswereemployedtomeetthepowerbudgetovertm.This,inturn,leadsustothefollowingoptimizationproblem:givenatm,howshouldthelocal(battery+temporal)knobsbeusedtomeetperformanceSLAs?WeuseagenericmetricR,whoseminimizationcorrespondstomeetingtheapplication'sSLA,tocastthisoptimizationprob-lem.Risgeneralenoughtocaptureawiderangeofapplicationmetrics,e.g.,responsetime(asinTPC-Wusedinourevaluation),reciprocalofthroughput(intransaction-orientedapplicationssuchasSpecjbb),time-to-nish(forlongrunningapplicationssuchasMap-Reduceandvirusscan),rateofplaybackdiscontinuitiesforastreamingmediaserver,etc.Weviewthedurationtmasbeingdi-videdintoWequal-sizedintervals.WeuseRijtodenoteaverageresponsetimeofferedbyserverjintimeintervali.WecannowexpresstheproblemofminimizingR(i.e.,minimizingl(tm))foragiventmusingpurelylocalknobs(battery+powerstates)).Thiscanthenbeiteratedoverdifferentvaluesoftmbetween0toWtodeterminewhentomigrate.MinimizingRUsingOnlyLocalKnobs:LetthebatteryonserverjhaveuptoBjjoulesofenergythatitcansafelyprovideforthisemergency,accountingforanyresidualenergythatneedstobemaintainedforavailability.Eachserverj=1;:::;N,canoper-ateduringanintervalwithinthisemergencyinaparticularpowermode(serveroff,DVFSandClockthrottlingstates),orderedasD0wheretheserverisoff,toDdwhichisthehighestpowercon-sumingstateandbestinperformance.WedenotetheintensityofaworkloadduringthisemergencyasL,discretizedforthespec-trumofintensitiesbetweentheminimumandmaximumforagivenworkloadasL1;:::;Ll.Forinstance,inSpecjbb,thetransactionratespeciestheintensity,andwecanhistogramthisratebetweenaminimumandmaximumintolbuckets.TheresponsetimeRij(Lij;Dij)offeredbyserverjduringintervalidependsontheloadLijthatisimposedonthatserverduringianditspowermodeDij,whereLij2[L1;:::;Ll]andDij2[D0;D1;:::;Dd].ThepowerconsumptionofthisserverduringicanbespeciedasPij(Lij;Dij),andthiscanalsobecalculatedaprioriandmadeavailabletotheoptimizerfordifferent(L;D)combinations.WecanthenphraseourobjectivefunctionofminimizingaverageresponsetimeoverWbyemployingpurelylocal(battery+powerstate)knobsas:minimizeWX=1NXj=1Rij(Lij;Dij):Letbijdenotewhetherserverjsourcesitspowerneedsinthei-thintervalfromitsbattery(bij=1)orthepowerline(bij=0).SincewecannotdrainmorethanBjforthispeak,wehave:W=1bijPij(Lij;Dij)Bj;8j:Theresultingtotalpowerdrawontheline,whichhastoadheretothespeciedbudgetPbudget,isgivenby:Nj=1 bijPij(Lij;Dij)Pbudget;8i:5.OnlineHeuristicsOurtheoreticalframeworkisimpossibletouseinpracticesinceitrequiresaprioriknowledgeoftheemergencydurationandin-tensity.Evenifsuchknowledgewereavailable,itmaybecom-putationallyprohibitive.However,westilluseitasabaselineforcomparisonwiththepracticalsolutionswedevelopnext.WerefertothesolutionofferedbyourframeworkaboveasOpt,andthesolutionitofferssolelyusinglocalknobsasOpt-local. Heuristics Description Local Throt Useonlypowerstates to BattFast+Throt Drainbatt.rstbeforethrottling server BattSlow+Throt Drainbatt.slowlywhilethrottling Local cMig Mig.withinclusterandturnoffservers tocluster BattSlow+Throt+cMig DelaycMigasfaraspossible Across dMig Mig.toelsewhereindatacenter datacenter BattSlow+Throt+dMig DelaydMigasfaraspossible Table1.SummaryofouronlineheuristicsWeconsidersevenonlineheuristics(Table1)basedonwhethertheknobofadheringtoPbudgetundercis(a)localtoaserver,(b)localtotheclusterofserversunderc,or(c)pertainstotheentiredatacenter.Asbefore,weassumeaprioriknowledgeofperformance(Rij(l;d))andpower(Pij(l;d))fordifferentloads(l)andpowerstates(d)ofaserverfortheapplication.Further,eventhoughtheabovetheoreticalframeworkallowsadifferentpowerstate(d)foreachserveratagiventime,ourheuristicsonlyconsiderarestrictedversionthatemploysthesamestateacrossallNserversatanytime.Theresultingdetrimentaleffects(ifany),wouldmaterializeinourresults.HeuristicsLocaltoaServer:Onewouldliketopreferablyuseknobslocaltoaserver,namelybattery,powerstatemodula-tion,and/ortemporaldeferringoftheloadtodealwiththeemer-gency,becauseoftheiragilityandlessdisruptioninthedatacenter.Ourrstsetof3heuristics-Throt,BattFast+Throt,andBattSlow+Throt-employonlysuchlocalknobs.Throtem-ploysonlypowerstatemodulation(DVFSand/orclockthrottling)andtemporalscheduling,andisrepresentativeofthepowerthrot-tlingmechanismsavailabletoday.Withknowledgeofthepowerconsumptionindifferentstatesforthecurrentload,thisheuris-ticpickstheleastperformanceimpactingpowerstateforallNservers,ensuringadherencetoPbudget.BattFast+ThrotandBattSlow+Throtsupplementtemporalknobswithbatterytoreduceperformanceimpact,andtherebysustainlongeremergencyhandling.Thetwodifferinbatterydrainrate.BattFast+Throtisrelativelyoptimisticabouttheemergencyduration,anddrainsthebatteryfullybeforeresortingtopowermodeknobs,therebynotrequiringthelatterifthebatterycansustaintheentireloadforshortemergencies.BattSlow+Throtisconservative,andtriestoprolongthebatteryusage.RecallthatweareusingthesamepowerstatedacrossallNserversatanytime,andourexperimental setupallowsthedrainratefromthebatteryatonlyservergranu-larity(i.e.,asingleservercannotsourcepartlyfrombatteryandpartlyfromthepowerline,whichispossibleindualpowersupplyservers).Consequently,werequire(N1)P(l;d)Pbudget,sincetheremainingserver,bysourcingitspowerfrombattery,wouldhelpreducetheoveralldrawfrompowerlinetoadheretothebudget.Hence,inBattSlow+Throt,asingleserverdrawspowerfromthebatteryatapowerstated,whiletheotherservers(alsooperatingatstated)drawtheirpowerfromthenormalsupply.disthehighestpowerstate(i.e.,leastperformanceimpacting)thatobeystheaboveconditions.ThisschemecanthussustainalongerdurationofbatteryoperationthanBattFast+Throt,thoughtheperformanceconsequencescanbefeltevenearlier(wherethebat-teryalonemayhavebeensufcienttohandletheemergencyinBattFast+Throt).HeuristicsLocaltoClusterc:Withlongeremergencies,localknobsmaynotsufcetomeetapplicationSLAswithinthestip-ulatedpowerbudget.Oneoptionmaybetomigratetheload.Itmaysometimesbedesirabletosimplyre-arrangeloadwithincduringanemergency,since(a)migrationoutsideofcmaynotbepossible(eitherthereisnoheadroomoradministrativerea-sonsforcetheapplicationtobetiedtonodeswithinc),and/or(b)thelocalityofapplicationneeds(frequentcommunication,datastoredlocally,etc.)maygetimpactedwhenpartsoftheapplica-tionareforcedoutofc.Thedownsidetomigrating(redistribut-ingtheload)withintheclusteristhatperformancemaybeim-pactediftheexistingloadisalreadypushingindividualserverstohighresourceutilization(whichisusuallywhatleadstotheemergency).Whenmigratingwithinc,weconsidertwoheuristics:cMigandBattSlow+Throt+cMig.IncMig,theloadisim-mediatelymigratedatthebeginningoftheemergency,fromoneormoreserversandtheseserversaresubsequentlyshutdown.ThenumberofserversfromwhichtheloadistobemigrateddependsonhowmanyneedtobetakendowntogetthetotalsubsequentpowerconsumptionwithinPbudget.Sincemigrationistypicallyintendedtobetheoptionoflastresort,wedonotconsiderlocalknobs(eitherbatteryorpowermodes)afterthemigration.Usingthesameratio-nale,BattSlow+Throt+cMigdefersthemigrationpointtoatimeusingtheabove-mentionedBattSlow+Throtstrategyuntilthebatterycapacityreachesresidualcapacityneededforavailabil-ity(minutes),andthenemployscMigwithinc.Whiletherearenumerouswaysofperformingloadmigration(seeSection3)were-strictourevaluationstovirtual-machinebased(live)migration[8],whichisaconvenientvehicleforperformingthistaskattheinfras-tructurelevelwithoutanyapplication-levelknowledge/mechanism.Wecanexploreotherstrategiesinfuturework.HeuristicsAcrossClusters:Handlinganextendedemergencywithoutsubstantialperformancerepercussionsmayneedmovingormigratingloadoutsideofctopartsofthedatacenterwithsuf-cientheadroom.Inthispaper,wedonotconsidertheproblemofwheretomovethisload,andsimplyassumethatitcanbeaccom-modatedelsewhere.Asexplainedabove,thelossinlocality(com-municationandstorage)aftermigrationcanimpactsubsequentper-formance,andweagainexploretwostrategies-dMig,whichper-formsthemigrationrightatthebeginningoftheemergency,andBattSlow+Throt+dMig,whichdelaysthemigrationasmuchaspossible.6.ImplementationandExperimentalSetupWeuseascaled-downexperimentalprototypetoevaluateourheuristicsandcomparethemwithOpt-localandOpt.Ourprototypeusesaclusterc(Figure3)ofN=8DELLPowerEdgeserverswithtwoIntelXeon3.4GHzprocessorseach,runningRed-HatLinux5.5.Theface-plateratingoftheseserversis450W.Their  \n  \r \n \r  \n      \n   \n\n !"###   $\n % &'( )'(( # * + % &'( * ,    \n  Figure3.Experimentalprototype.idlepowerconsumptionisaround120Wandthepeakpowerthatwecanpushtheservertoacrossourworkloadsis320W.Thedy-namicpowerconsumptioncanbemodulatedwith4DVFSstates(P-states:3.4GHz,3.2GHz,3.0GHz,and2.8GHz)and8clockthrottlingstates(T-states:12.5%,25%,...,100%).Tochangepowerstates,wewritecustomdriversusingtheIA PERF CTLandIA CLOCK MODULATIONMSRregisters.Eachserverisdirectlyconnectedtoa1000WAPCUPS[1]which,inturn,isconnectedtoanoutletofa30AmpRaritanPDU.Althoughwehavea1000WUPSunitconnectedtoeachserver,forallourexperimentsweonlyassumea330WUPS(closetothemaximumpowerconsumedbyourserver)anddraintheUPSus-ingacorrespondingscaled-downruntimechart3.Weconsidera4-minutebatteryperserverwhichisrelativelyonthelowerend,ofwhichweleavearesidualcapacityof2minutes,requiredforavail-abilityguarantees.TheUPSiscapableofreportingitsload,powerdrawandremainingbatteryruntimeoveranRS232serialinterface.ThePDUiscapableofdynamicallyswitchingON/OFFthesupplytoindividualUPSunitswithSNMPcommandsoverEthernet.Byturningon/offindividualoutlets,wecanselectivelyhaveaserversourcepowerfromeitherthebatteryorthepowerline.Weuseaseparatemachine(PowerBudgetEnforcer)toimplementtheheuristics-sendthrottling,migrationandPDUturnon/offcom-mands.OurclusterhasasharedNASboxwhichismountedasaNFSstoragevolumebyalltheservers.Weuseanothercluster(notshowninthegure)of8serversasthedestinationformigratingworkloadsindMiganddMig+BattSlow+Throt.Allourap-plicationsarehostedasVMsunderXenoneachserver.7.EvaluationWeusefourcasestudiesinvolvingsixdifferentapplicationstoeval-uatetheefcacyofouronlineheuristics.Ineach,wepresent(i)salientworkloadproperties,(ii)emergencieslastingarangeofdu-rationsandcorrespondingtodegrees(as10-30%ofpotentialpeak)ofunderprovisioning,(iii)theremedialactionscorrespondingtoourheuristics,and(iv)acomparisonoftheirefcacyinalleviatingtheemergencywithrespecttoOpt-localandOpt.Notethatweareonlyconcernedwithperformanceduringanemergencyinthiswork.Theremedialactionsforemergencieslastingseveralhourswilltypicallymandatemigrationandtherearenofurtherinsightstobegainedbystudyingsuchlongdurations.Asdiscussedearlier,wehaveproledtheperformance(R(l;d))andpower(P(l;d))of 3TheminimumcapacityofUPSunitsavailablefromAPCis500W.Al-thoughUPSunitsaretypicallyover-provisioned,weassumethisconserva-tivetightprovisioningof330Wforourexperiments. eachworkloadapriori.Ingeneral,enterprise/internetapplicationstypicallyundergoextensiveprolingforright-sizingofdatacen-terITresources.Evenforcloud-hostedthird-partyapplications,re-sourceusagecanbedeterminedviaofine/onlineproling.Suchproling,withreadilyavailablepowermeters[25]onthoseplat-forms,orincombinationwithwell-understoodpowermodels[16]thatarebasedonresourceutilization,canbeusedtodetermineRij(l;d)andPij(l;d).Since,wealreadyhaveplentyofgroundtocoverinthispaper,forthepurposesofthisworkweassumethatRijandPijforagivenworkload,asafunctionofdifferentpowerstatesismadeavailable.Amoredetailedtreatmentoftheseissuescanbeconsideredinfuturework.7.1TPC-WandSPECjbbWestudytwowell-knownserverbenchmarks:TPC-W[40]andSPECjbb[41].TPC-Wemulatesa3-tiered(Apache,Tomcat,andMySQL)transactionalWeb-basedeCommercebookstore.TheApachefront-endrunsonadedicatedserver,whiletheothertwotiersrunonasetofserverswhosesizeischosentoaccommo-datetheworkloadintensity.ApacheemploysarequestdistributionmoduletobalancerequestsamongreplicasofTomcat.WeusetheclusteredMySQLdatabaseenginethatprovidesareplicable,shared-nothingdatabasetier.EachTomcatandMySQLinstancerunsinitsownXendomain.TPC-WservicesaspeciednumberofclientsoverpersistentHTTPsessions.Weusetheaverageclientresponsetimeduringtheemergencyastheperformancemetric(R)forTPC-W.SPECjbbisa3-tieredserver-sideJavawarehouseman-agementapplication.Weusetheaveragetransactions/second(tps)asourperformancemetric(R)forSPECjbb.EmergencyHandlingforTPC-W:ApplicationslikeTPC-Wareknowntoexperiencesignicanttemporalvariationsintheload.Manysuchvariationscanbepredicted(e.g.,time-of-daybehav-ior)toensurethatenoughpowercapacityisprovisioned.However,thereareothervariationsnotamenabletosuchprediction(e.g.,ashcrowds)whichcancauseemergencieswhenunderprovision-ing.Respondingtothegrowingworkload,thedatacenterincremen-tallyaddsreplicasofTomcatandMySQLonnewserverstillall8serversareutilized.Thiscancauseanemergencysincetheaggre-gatedrawoftheseserverscanexceedthepowerbudgetoftheun-derprovisionedcluster.Forinstance,whentheworkloadsaturatesall8servers,theaggregatepowerconsumptionhits1630W.Iftheinfrastructureisunderprovisionedby10%,20%and30%,thenthecorrespondingPbudgetlimitsare1470W,1300W,and1140Wre-spectively.Weinjectloadtointroduceemergencydurationsof2,8,15,30,and60minutes,overtheseprovisionedlimits.Figures4(a)and(b)presentthedegradationinaverageresponsetimesofTPC-Wwith10and30%underprovisioning.Throtchooses(2.8Ghz,100%Clk)and(2.8Ghz,25%Clk)for10%and30%underprovisioing,respectively,andseverelydegradesperfor-mance(goingfrom30%degradationtoover500%inthemoreun-derprovisionedcase)inthesehighutilizationregimes.Ontheotherhand,BattFast+Throtisbetter,particularlyforshorttomod-erateemergencydurations(upto15minutes).Forsuchemergen-cies,itisabletocompletelysourcetheexcesspowerfrombatterieswithoutthrottling.Forlongerdurations(30minutesandmore),thebatteriesrunout,mandatingthrottlingwhichdegradesperfor-mance.ThehighsensitivityofperformancetoevensmallCPUratemodulationmakesBattSlow+Throtmostlyineffectual:forshortemergencies(upto15minutes),itunnecessarilythrottlesduetoitsconservative(slow)drainfrombatteries;formoderatedurations(upto30minutes),itstretchesthebatteryruntime,buttheaccompanyingthrottlinghurtsperformance;forlongdurations(morethan30minutes),batteriesrunoutwithsubsequentconse-quencessimilartoBattFast+Throt.Nextweconsiderthecluster-levelmigration,cMig.cMigemploysXen'slivemigrationfacility[8]toseamlesslymigrateasubsetof(letusdenoteitssizeasn)Tomcat/MySQLrepli-casfromtheiroriginalserversandco-locatesthemwiththoseontheremainingnservers-whichinturnbecomeover-loaded.Thesenunoccupiedserversarenowturnedoff.Wendntobe1and3forunderprovisioningdegreesof10%and30%,respectively.Forthe30%underprovisioning,allactiveserverscontinuetooperateattheirhighestpowerstates.SincetheTPC-Wcomponentswereoperatingattheirpeakrequirements,thecomponentsthatareco-locatedexperiencesignicantresourceshortage,causingresponsetimetonearlydouble.Infact,cMigfaresworsethantheearliertwoheuristicsinvolvingthebattery.TheideabehindBattSlow+Throt+cMigistoimproveuponcMigbypostponingmigrationasmuchaspossible;itstartsasBattSlow+ThrotandswitchestocMigwhenthebat-teriesrunout.Forsmall/moderateemergencies,thisdefaultstoBattSlow+Throtwiththesameprosandcons-essentiallyun-desirableforTPC-W.ItisworseforlongeremergenciessinceitwillendupswitchingtocMig,whichwehavealreadyfoundtohurtTPC-W.Finally,letusdiscusstheefcacyofdMig.Wemigrate1and3Tomcat/MySQLVMpairsfor10%and30%underprovisioningrespectively.LivemigrationcanbecarriedoutrelativelyquicklyforTPC-W(about2minutes)4.Furthermore,lossofdatalocal-itysufferedbymigratedVMsisnegligibleinthiscase(i.e.smallr),implyinglittleperformanceconsequenceafterthemigration.Consequently,dMigturnsouttobethemosteffectiveheuris-ticandisabletohandletheentirerangeofemergencieswithlittleperformanceconsequence(thoughrequiringthebatterytotemporarilyhandlethepowerspikeduringtheactofmigration).TheonlysituationswheredMigisanunwisechoiceiswhentheemergencylastslessthanthemigrationdurationof2minutes.BattSlow+Throt+dMigattemptstopostponetheremotemi-grationinvocationandendsupofferingworseperformancethandMigduetoTPC-W'ssensitivitytoevensmalldegreesofthrot-tling.BattFast+ThrotcomesclosetoOpt-localforemergen-cieslessthan15minutes,whileBattSlow+ThrotisclosertoOpt-localforlongeremergencies.Incidentally,Opt-localneverchoosesThrotwhenweexamine(post-mortem)decisionsreachedbytheofinealgorithmfordurationslessthan15min-utes,andinlongerdurationsitchoosesheterogeneouspowerstatesacrosstheservers,whileourbatterybasedheuristicschoosethesamestateacrossallserversatanytime.Specically,Opt-localusesalowerpowerstateonserversthatsourcefrombatteries,stretchingthebatteryruntime.Inthiscasestudy,Optrstdrainsthebatteryfully,andthenimmediatelymigratestheloadtoare-motenodesincethereislittleperformanceimpactaftermigration.EmergencyHandlingforSPECjbb:EmergencyhandlinginSPECjbbissimilartoTPC-W,withsimilarresults.The8servershousingSPECjbbreplicasendupoperatingclosetosaturation,andtheiraggregatepowerconsumptionisfoundtobe1875W.Fig-ures4(c)and(d)presentthedegradationinaveragetpsduringvariousemergencies.KeyInsights:(i)CPUthrottlingisundesirableforthisclassofapplications,evenforshortormoderatedurations.(ii)batteriesarehelpfulforawiderangeofemergencies(upto30minutes).(iii)batteriescanofferlowerpowerdrawsthaneventhedeepest 4Incidentally,thoughwedonotexplicitlypresentthosedetails,wewishtopointoutthatmigrationitselfdoesintroduceaspikeinpowerconsumption(of10%)forthisworkload,andthebatteryisstillneeded(insolutionssuchascMiganddMig)totemporarilygetthepowerdrawundercontrolduringtheactofmigration. LegendsforBarCharts:Notethattheleg-endnumbersaregivenabovesomeofthebarsforeasierreadability.OptandOpt-locallinesarealsodrawnasreferencelowerbounds.  \n \r\n \n \r\n \n \r \r\n  \n \n \n 0 5 10 15 20 25 30 35 60 30 15 8 2 46 0 100 200 300 400 500 600 60 30 15 8 2 357357357 (a)TPC-W,10%(b)TPC-W,30% 0 2 4 6 8 10 12 14 60 30 15 8 2 11 0 10 20 30 40 50 60 30 15 8 2 3575757 (c)Specjbb,10%(d)Specjbb,30% Figure4.TPC-WandSpecjbbresults  \n \r    \r \n  \n \r      \r  (a)MapReduceprole:Default,30%withBattSlow+Throt+dMig.Reducephasehasbeentruncatedforclarity. 0 5 10 15 20 25 30 16 GB 8 GB 23 0 20 40 60 80 100 16 GB 8 GB Perf. Degradation (%)Input File Size12345671234567 (b)MapReduce,10%(c)MapReduce,30% Figure5.MapReduceresultspowerstates.(iv)thelow/zeroslowdownaftermigration(r),fa-vorsdMig(comingclosetoOpt),thoughthebatteryisstillneededtohandletheinitialpowerspikeduringmigration(whichtakes1-2minutes).(v)Opt-localandOptpointtowardsthepossibilityofheterogeneouspowerstateassignmenttoserverstoachievebet-terpower/performancetrade-offsthanofferedbyourheuristics.7.2MapReduceThisworkloadrepresentsagrowingandimportantclassofparallelapplicationsusedindomainssuchassearchengines.WerunawordcountapplicationusingHadoop[22].Itschedulesthemapperandreducertasksacrossasetofspeciedserversandreportstheapplicationnishtimeasitsperformancemetric.Weconsider8GBand16GBforinputlesizes.MapReduceinherentlyusesdistributedstorage,placingcomputationsclosertothedatathatitneeds.Ifoneconsidersaserver'sstoragevolumetobepartoftheVM,thencMiganddMigbasedoninfrastructure-levelVMmigrationwouldrequiremovingallofthisdatatothedestinationnode(whichcanbequitelarge),ratherthanjustthedataneededbysubsequentcomputation.Whileonecouldconsiderthisinfuturework,inthissetofexperiments,weassumestoragetobedecoupledfromtheVMimage(byimplementingthelocalstorageasaNFS-mountedserver),andonlymovetheVMimageformigration,withthesubsequentcomputationmakingNFScallstogetthespecicdatathatitneedsfromthesourceclusterc(wherethedataisreplicatedforavailabilityevenifserversgodown).EmergencyHandlingforMapReduce:Figure5(a)showsevolu-tionoftheaggregatedpowerofthe8serversrunningMapReduce(labeledasdefault)withthe16GBinput.Weseehighpowervaria-tionovertime,suggestingunder-utilizationifweprovisionforthepeak.Specically,themapperphase(upto30min.)consumessig-nicantlyhigherpowerthatreachesupto2020W.Consequently,asurgeofmapperactivitiesacrosstheclustercansometimesleadtoemergenciesinanaggressivelyunderprovisionedsystem.Delayingthemappersand/orspreadingthemtemporally/spatiallycandelaytheapplicationandpossiblyimpactitslocality.Figures5(b)and(c)comparetheperformanceofourheuristicsforunderprovisioningof10%and30%(withrespectto2020W),respectively.Forthe8GBinput,wendbattery-basedtechniquesareabletohandletheemergencywithlittle/noperformancecon-sequencesfor10%underprovisioning(comparabletoOpt).Theemergencydurationforthisinputisroughly15minutes,whichcanbeeasilysustainedbyeachofthe8servers(only1serverneedstonotsourcefrompowerlinetomeetpowerbounds)tak-ingturnssourcingpowerfromthebatteryforroughly2.3min-uteseach.Infact,thepowerstatemodulationandmigrationmech-anismsdonotevenkickinwhenthebatteryiscomplementedwiththeseknobs.However,for30%underprovisioning,theotherknobsarealsoemployed,andtheperformanceprogressivelyde-grades.Still,thebattery-supplementedtechniquesdobetterthanwithoutthisknob.Further,whileBattFast+ThrotdoesbetterthanBattSlow+Throtinthe20%underprovisioningcase(notshowningure),theresultsarereversedinthe30%underprovi-sioningcase(showninFigure5(c))wheretheemergencyman-datesahigherpowershaving,causingthebatterytorunoutfasterifthatistherstknobofchoice.Sustainingalongerperiodofoper- ationwiththebattery(aslowdrainrateachievedwithsimultaneouspowerstatecontrol)isabetteroptioninsuchcasesofhighunder-provisioning.The16GBinputextendstheemergencydurationtoroughly30minutes,makingitnecessarytosupplementbatterywithpowerstatemodulationand/ormigration.Thisdegradesperformanceeveninthe10%underprovisioningcase.Inboth8GBand16GBexper-iments,wendmigrationdoesrelativelyworsethanusinglocalknobsalone.Thisisbecauseofmigrationoverheads,wheredoingitlocally(cMig)resultsinincreasingtheloadononeormoreslavenodeswithinthecluster,delayingtheprogressoftheapplication.dMigforMapReduce,hasadifferentproblem-lossofdatalocal-ityrequiringconsiderabledatamovementacrossclusters-whichtremendouslyimpactsperformance(thoughthisisstillbetterper-formingthancMigwhichoverloadsservers).MapReduce,thus,depictsaspectrumofworkloadsnotasconducivetomigrationforemergencyreaction(unlikethestatelessapplicationsinprevioussubsection)impactingperformancenotjustduringthemigrationbutalsosubsequently.Likebefore,supplementingmigrationwithlocalknobshelpsdefertm(toabout23minutesinthe16GBex-periment),tolessenthesubsequentperformanceslowdown(r)af-termigrationinbothcMiganddMigcases.Migrationismorecompetitiveforlongeremergencydurations-forinstance,con-siderthe16GBdegradationresultsin30%underprovisioningforcMig(92%)anddMig(55%)withrespecttoBattSlow+Throt(34%),andcomparethemwiththoseforthe8GBdegradationwhereBattSlow+Throtsuffersonly16%degradationwhilecMiganddMigstillsuffer89%and49%degradation.Thelocalknobssupplementedwithmigrationcanthushelpbridgetheemer-gencydurationgapwhenmigrationbecomesmorecompetitive.Figure5(a)presentssalientdecisionsmadebyourheuristic,BattSlow+Throt+dMig(theclosestheuristictoOpt)duringtheemergencywith30%under-provisioning:(i)Durationta:oneserveratatimeissourcedfromitsbatteryandall8serversareoperatedat2.8GHzDVFStillt=23minutes;(ii)Durationtb:migrating3mapperVMstoanotherclusterwhichtakes1-2min-utes;(iii)Durationtc:3machinesareshutdownaftermigration,andtheremaining5serversoperateatthehighest3.4GHzDVFSstate.(iv)Durationtd:whichisthereducephasewherethereisnolongeranemergency,thoughwedonotconsidertheoptionofmovingbacktheVMstocinourexperiments.Notethatweadheretothe1420Wcap(30%underprovisioning)overtheentiredura-tion.PerformancedegradationofOptcloselymatchesourheuristicsthatonlyusetheserver-localknobsfor10%underprovisioningwheremigrationislessdesirable(Optdoesnotchoosemigrationinthiscase).Weobservethesamebehaviorforthe8GBinputat30%underprovisioningwhereOptonlyuseslocalknobsfortheentireemergency.Oneinterestingobservationisthatforthe16GBinputat30%underprovisioning,Throtbecomesveryexpensive-about72%decreaseinthroughput.OptresortstodMigattm=minutesinthiscase,whichcloselymatchesthedecisionbyBattSlow+Throt+dMigwherethebatteryrunsoutofchargeatapproximatelytherdminuteandmigrationisinitiated.ItisimportanttonotethatMapReduceisverysensitivetothevalueoftmandmigratingbeforeorafter24minutesresultsinpoorperformance.KeyInsights:(i)Longeremergenciesmaymandatemigration,butthisslowsdowntheapplicationeitherduetooverloadofsomeserversinthecluster(cMig)orpoordatalocality(dMig);(ii)BattSlow+Throtpostponesmigration,andallowsittobecomemorecompetitive;(iii)whenmigrationisexpensive,Optrefrainsfrommigrationforshortemergencies(BattFast+Throtcomesclose)anddefersmigrationaslongaspossibleforlongemergen-cies(BattSlow+Throt+dMigcomesclose).  \n \n \r \n\r \n\r\r  \n\r\r \n Figure6.MediaServerResults.7.3StreamingMediaServerOurthirdcasestudyusesamultithreaded1.5MbpsstreamingMPEGmediaserverthatservicesseveralJavaclients.Itspawnsaseparatethreadforeachclient,whichdoespasswordauthenticationbeforeitstartsstreaming.A4.5MBbufferisusedattheclienttosmoothentrafcvariations.Therecouldstillbeplaybackdiscon-tinuities(glitches)whenthebufferbecomesempty.Theserverrunsonallmachines,streaminga60-minutelongvideoforato-talof2400clients.EmergencyHandling:Duringtheinitialphase(sayrst10min-utes),whentheclientstrytoconnecttotheserver,weobserveapowerspikeofabout1700W,comparedtothesteadystatedrawof1200Wwhenthereisonlysubsequentstreaming.Thisinitialcon-nection/authorizationphasecanbeviewedastheemergencydura-tionandweevaluateourheuristicsoverthisdurationfor10%,20%and30%underprovisioninginFigure7.2.Throtdegradeswithaggressiveunderprovisioning(from8glitchesperclientfor10%underprovisioningtoover100for30%),sinceitisnotabletosustainthestreamingneeds.Duetotheshortemergency(about10minutes),BattFast+ThrotisabletohandleitwithoutthrottlingandhenceperformsmuchbetterthanBattSlow+Throt.Infact,BattFast+Throtperformsverywellevenat30%underprovisioning,incurringonlyone-fthoftheglitchescomparedtoThrot.WenotethatcMighurtsperformanceevenforsmallunderprovisioning,sinceeachserverisalreadysaturated.BattSlow+Throt+cMigisabletode-lay(actuallyavoidsmigrationcompletelybecausethebatteryisabletofullysustaintheemergency)thismigration,defaultingtoBattSlow+Throt.ItisinterestingthatdMig,whichhadpoorperformanceforMapReducedoesverywellsincethereisalmostnoperformanceimpactafterthemigration(risnegligible).Ittakesonlyabout30secondsforthemediaserverVMtomigrateandwendthattheclientbuffersizeisgoodenoughtosustaintheperformanceimpactformostofthisduration.Whilethebatteryservesasabuffertoallowtemporarypowerspikesatthemediaservers(asinearliercasestudies),asimilareffectisachievedbytheclientbufferwhichallowthemediaserverstoslowdown(eitherbypowerstatetransitionormigration)temporarily-thusreducingtheirpowerdraw.WealsoobservethatunlikeMapReduce,itdoesnotmakesensetodelaymigration(BattSlow+Throt+dMig)sincemigrationoverheadsarelow,andthrottlinghassevereperfor-manceimpact.Optchoosestousethebatteryalonefor10%and20%under-provisioning,equivalenttoBattFast+Throtanddoesnotincuranydegradation.For30%underprovisioning,batterycannotfully sustainthepeakandOptusesdatacenter-levelmigrationwhichre-sultsinjust2glitchesperclient.Inthisapplication,wendtm=7minutesforOpt(unlike23minutesinMapReduce),sincemigra-tionoverheadisnegligible.Interestingly,whenouroptimization(Opt-local)isrunwithjustthelocalknobswhereweseethatfor30%underprovisioningitiscomparabletoOptandachievesasfewas3glitchesperclientcomparedtothebestlocalknobheuris-tic-BattFast+Throt-whichincurs27glitches/client.WhenweanalyzedtheOpt-localresults,wenoticedthatitsourcesbatteryfromserversatanytimeandusesthrottlingtoshaveonlytheremaining10%.Thisbehaviorissomewherein-betweenourbattery-aggressive,BattFast+Throtandbattery-conservative,BattSlow+Throtheuristicswhichsourcefromserversandserverrespectivelyfrombatteryandshavetherestbythrottling.Thisshowstheimportanceofdynamicallyadjustingbatterydrainrate,whichweplantoinvestigateinfuturework.KeyInsights:(i)Battery-basedheuristicsperformverywellduetothesmalleremergencyduration.(ii)Thebatteryandclientbuffer,incombination,provideaseamlessstrategyforofoadingtheworktoelsewhereinthedatacenter,andhidemigrationcost.7.4GraphicsApplicationandVirusScanThiscasestudyinvolvestwoapplications:aGPUapplicationinCUDAimplementingtheBlack-ScholesnancialmodelusingaNVIDIAcard,andVirusScan(LinuxAVG[29])whichneedstorunonthelocalmachinetoscanits40GBharddrive.Notalldatacenterserversmayoffertherequiredgraphicssupport(e.g.,AmazonEC2offersseparateGPUclusterinstances),andmovingtheVirusScanelsewhereisnotanoption.VirusScanisastrawmanweusetoillus-trateapplications,that(i)aretiedtoaspecicserver,(ii)arefairlyexibleintheirprocessingrate(low-priority),and(iii)havesomekindofdeadline(24hoursinthiscase).Otherexamplesincludeback-ups,search-engineindexing,etc.Eventhoughthedeadlinesaresoftintheseexamples,wewilluseaharddeadlineformoregeneralillustration.WevarytherateatwhichGPUapplica-tionsarrivewhichcan,inturnimpacttheschedulabilityofVirusS-can,andreportthroughput(GPUops/sec)subjecttothe24hourdeadlineforVirusScanthatneedstobemet.Inourservers,theGPUapplicationrunningaloneconsumes250W,theVirusScanatfull-throttlerunsat208Wtaking45minutestocomplete,andthetwotogetheratfull-throttlehit315W.EmergencyHandling:Wedeneanemergencyinthiscaseasanout-of-the-ordinarydaywhereintheVirusScandoesnotgetsuf-cientbandwidth(becauseofthepowerbudget)torununtilthelastxminutesbeforethe24hourdeadline,andwevaryxbe-tween240to45minutes(beyond240minutesthereissufcientbandwidthforourconsideredload).Thissetupisdifferentfromtheearliercasestudies,sincewehavemultipleapplications.WeadaptThrot,BattFast+ThrotandBattSlow+Throtforthisscenariosincemigrationisnotanoption.Wedonotexplic-itlydiscussresultsforBattSlow+ThrotwhosebehaviorisnotverydifferentfromBattFast+Throt.ThrotdoespowerstatemodulationcontinuouslyoverthelastxminutesforVirusScantonishitsremainingexecutionbeforethe24hourdeadline.CPUthrottlingstateshavelittleimpactontheprogressorpowercon-sumptionoftheGPUapplication(asitdependsmainlyontheGPU).Hence,ifthereisnotenoughslackatagiventimetomeettheVirusScandeadline,theGPUapplicationisputonhold(sus-pended),andVirusScanisrunatfull-throttleuntilthereisslackagain.BattFast+Throtemploysbatteryaslongaspossiblefromthebeginningofthexminutes,untilitisdrainedtoresidualcapacityofminutes(requiredforavailability),withVirusScanrunningatfull-throttle.Subsequently,Throtisemployed. (a)80%GPUapp(b)100%GPUapp Figure7.PerformanceofGPUapplicationduringvariousemer-genciescausedbyVirusScan'sdeadline.(10%underprovisioning)Arepresentativeresultfor10%powerunderprovisioning(withrespecttothemaximumpossibledrawof2520W)isgiveninFig-ure7.Theemergencyiscapturedbydifferentvaluesofxonthex-axis,andwepresentthepercentagedegradationinGPUopsonthey-axis.ThisdegradationisshownfordifferentimposedloadsbyGPUapplication,withintensitiesdepictedas80%and100%,i.e.percentageoftimethattheGPUisactive.Throtbyitselfsuf-ceswhenthereissufcientnotice(x�)giventorunthesealgorithms.GettingclosertothedeadlinetoreactdoesnotallowmuchroomforThrottorunbothapplicationswithoutaffectingtheGPUapplication.Thereis50-60%slowdowninGPUopsdur-ingthelasthourwithThrot.HigherGPUapplicationintensityleavesevenlessroomforThrottoallocatesufcientbandwidthforthetwo,therebyimpactingthroughput.Thebattery-basedsolu-tion,ontheotherhand,isabletodomuchbetteracrosstheentirespectrum,notsufferinganylossforx.ItisonlyatveryhighGPUapplicationintensities,andverylittlereactiontime(inthelasthour)thatitresultsinperformanceloss,andeventhendoessignicantlybetterthanThrot.Further,thebatterybasedsolutioncomesquiteclosetoOpt,withthelatterdoingslightlybetterbe-causeofmoreleewayinwhatgetsscheduled(GPUorVirusScan)atanytimeondifferentservers,andthepossibleheterogeneouspowerstatesassignmentacrosstheserversatagiventime.KeyInsights:(i)Applicationintensity,slackinschedulingandreactionwindow,allimpactthroughput.(ii)Powerstatemodulationaloneisnotabletohandlehighintensitieswhensufcientslackisabsent.(iii)Batteryprovidessubstantialleewayinhandlingemer-genciesforschedulingworkloadswithbothtemporal(deadlines)andspatial(needtorunonspecicservers)constraints.7.5SummaryofObservationsTable2summarizestheresultsfromthesixdifferentapplica-tioncase-studiesevaluatedabove,pointingouttheefcacyoftheknob(s)towardshandlingthedifferentperformance-powercharac-teristics.7.6Investinadditionalbatterycapacity?Eventhoughwehaveconsideredaconservative4-minutebatterycapacityinourexperiments,onemayaskwhetherthecostofad-ditionalbatterycapacitycanbejustiedbythepotentialreductionindatacenterpowerinfrastructurecoststhatwecangainbyunder-provisioning.Letusdenotethecostofprocuringadditionalbat-terycapacitytosustainhoursofemergencyascbat$/watt,andthecap-excostofthepowerinfrastructuretounder-provisionbyccap$/watt.Lead-acidbatterycosts(cbat)reportedinliterature(seeDOE/Sandiadata[24])areinthe100-300$/kWhrange,andweconsiderconservativevaluesashighas500$/kWh.Datacenter Application Emergencyhandling characteristics (knobselection) Shortpeak Batteriesareself-sufcientinhandlingpeaks widths upto30minswithoutrequiringexpensivethrottling Mediumpeak Batteriessupplementpowerstatethrottling widths toreduceperformanceoverheads Longpeak Immediatemigrationisbenetial, widths Batterieshidepowerspikeduringmigration Sensitiveto Migrationmayimpactperformance, datalocality Batterieshelppostponemigration Slack-based Shiftpeaktemporallyviaexibleworkload workloads scheduling,Batterieshelpcreatemoreslack Table2.SummaryofResults  \n \r \r\r\r\n\n\r  \r       !\n " #$%&'(  )  \r      \r\n\n*\r\r\r\n\n*  )) Figure8.ROIwithadditionalbatterycapacities.Positivevaluesindicatesinvestmentisworthwhile(magnitudeindicateshigherre-turns)whilenegativevaluessuggestnotinvestinginhighercapac-ities.powerinfrastructurecap-exisreportedtogrowby$10-25forev-eryprovisionedwatt[4,23].Sincethisincludescostsforcooling,DieselGenerators,UPS,etc.,itisnon-trivialtoisolatethe$/WfortheITpowerinfrastructure.Consequently,westudyawiderangeforccapstartingfromaslowas1$/Wgoingallthewayto15$/W.WecanthencalculatetheReturn-On-Investment(ROI)foraddi-tionalbatterycapacityas:ccapecbat ecbatandshowtheseinFigure8foremergencydurationsrangingfrom5minutesto4hours.Inthesecalculations,weensurethatbatterycostsareamortizedover4yearlifetimeswhileinfrastructurecostsareamortizedover12yearlife-times.WendapositiveROI(veryhighROIinmanycases)formostoftheoperatingregions,despiteconsideringpessimisticsce-narios.Thissuggeststhatinvestinginadditionalbatterycapacitymaybeworthwhile,althoughamorein-deptheconomicanalysisiswarrantedinfuturework.8.ConcludingRemarksWehavepresentedaframeworkfordealingwithemergenciesaris-ingfromaggressiveunderprovisioningofthepowerinfrastructure.Ratherthandisruptivefuses/circuit-breakers,ITcontrolledtech-niquessuchaspowerstatemodulation,andworkloadmigrationcanbesupplementedwithournewproposalofleveragingalreadyexist-ingUPSbatteriestogracefullydealwithemergencies.Wedemon-strateusinganexperimentalprototype,withseveralinterestingusecases,thatthebatterybasedapproachesare(i)self-sufcienttodealwithshortdurationemergencies,(ii)supplementexistingsolutionstoenhancetheirefcacyoverawiderrangeofoperatingconditions,and(iii)createopportunitieswhereotheroptionsareinfeasible.Wehavealsopresentedanofinetheoreticalframeworktondboundsonhowwellwecanperformundertheseemergencies,andpre-sentedseveralonlineheuristicsthatcanadaptthemselvestoworkundertheseunpredictableblackswanevents(when?howsevere?howlong?).Sincetheseemergenciesaretypicallyaconsequenceofloadsurges,existingsolutions-powerstatethrottlingandmigra-tion-bythemselvescanhaveseriousperformanceimplications.Overall,wendthatwhenmigration/load-redirectionoverheadislow,afastdrainofthebatterylocallytocontrolthepowersurgeuntilmigrationiscomplete,worksquitewell.Attheotherextreme,whenmigrationcostsarehigh,delayingthemigrationwithacom-binationofslowbatterydrainandpowermodecontrolisthebetteroption.Thebatteryisthusanagileandusefulstand-aloneand/orcomplementarysolutiontoaddressbothshortandlongdurationemergencies.AcknowledgmentsThisworkwassupported,inpart,byNSFgrantsCNS-0720456,CNS-0615097,CAREERaward0953541,andresearchawardsfromGoogleandHP.WewouldliketothankLuizBarrosofromGoogle,whoprovidedvaluableinputsonmotivatingandshapingtheworkpresentedinthispaper.References[1]1KWAPCUPS-SURTA1500RMXL2U.http://www.apc.com/products/.[2]F.AhmadandT.N.Vijaykumar.Jointoptimizationofidleandcoolingpowerindatacenterswhilemaintainingresponsetime.InProceedingsoftheArchitecturalsupportforprogramminglanguagesandoperatingsystems(ASPLOS),2010.[3]H.Amur,J.Cipar,V.Gupta,G.R.Ganger,M.A.Kozuch,andK.Schwan.Robustandexiblepower-proportionalstorage.InProceedingsoftheACMSymposiumOnCloudComputing(SOCC),2010.[4]L.A.BarrosoandU.Holzle.TheDatacenterasaComputer:AnIntroductiontotheDesignofWarehouse-ScaleMachines.MorganandClaypoolPublishers,2009.[5]D.Bhandarkar.WattMattersinEnergyEfciency,ServerDesignSummit,2010.[6]J.Chase,D.Anderson,P.Thakur,andA.Vahdat.ManagingEnergyandServerResourcesinHostingCenters.InProceedingsoftheSymposiumonOperatingSystemsPrinciples(SOSP),2001.[7]Y.Chen,A.Das,W.Qin,A.Sivasubramaniam,Q.Wang,andN.Gau-tam.ManagingServerEnergyandOperationalCostsinHostingCen-ters.InProceedingsoftheConferenceonMeasurementandModelingofComputerSystems(SIGMETRICS),2005.[8]C.Clark,K.Fraser,S.Hand,J.Hansen,E.Jul,C.Limpach,I.Pratt,andA.Wareld.Livemigrationofvirtualmachines.InProceedingsoftheSymposiumonNetworkedSystemsDesignandImplementation(NSDI),2005.[9]Commercialcircuitbreakers,2008.http://circuit-breakers.carlingtech.com/all_circuits.asp.[10]Q.Deng,D.Meisner,L.Ramos,T.F.Wenisch,andR.Bianchini.Memscale:activelow-powermodesformainmemory.InProceedingsofArchitecturalSupportforProgrammingLanguagesandOperatingSystems(ASPLOS),2011.[11]FacebookOpenComputeProject,2011.http://opencompute.org.[12]X.Fan,W.-D.Weber,andL.A.Barroso.Powerprovisioningforawarehouse-sizedcomputer.InProceedingsoftheInternationalSymposiumonComputerArchitecture(ISCA),2007. [13]W.Felter,K.Rajamani,C.Rusu,andT.Keller.APerformance-ConservingApproachforReducingPeakPowerConsumptioninServerSystems.InProceedingsoftheInternationalConferenceonSupercomputing(ICS),2005.[14]M.E.FemalandV.W.Freeh.Safeoverprovisioning:Usingpowerlimitstoincreaseaggregatethroughput.InWorkshoponPower-AwareComputerSystems(PACS),2004.[15]J.FlinnandM.Satyanarayanan.Managingbatterylifetimewithenergy-awareadaptation.TransactionOnComputerSystems(TOCS),2004.[16]A.Gandhi,M.Harchol-Balter,R.Das,andC.Lefurgy.Optimalpowerallocationinserverfarms.InProceedingsoftheConferenceonMeasurementandModelingofComputerSystems(SIGMETRICS),2009.[17]L.Ganesh,J.Liu,S.Nath,G.Reeves,andF.Zhao.UnleashStrandedPowerinDataCenterswithRackPacker.InWorkshoponEnergy-EfcientDesign(WEED),2009.[18]GoogleServer-levelUPSforimprovedefciency.http://news.cnet.com/8301-1001_3-10209580-92.html.[19]S.Govindan,J.Choi,B.Urgaonkar,A.Sivasubramaniam,andA.Baldini.Statisticalproling-basedtechniquesforeffectivepowerprovisioningindatacenters.InProceedingsoftheInternationalEu-ropeanConferenceonComputerSystems(EUROSYS),2009.[20]S.Govindan,A.Sivasubramaniam,andB.Urgaonkar.BenetsandLimitationsofTappingintoStoredEnergyForDatacenters.InPro-ceedingsoftheInternationalSymposiumofComputerArchitecture(ISCA),2011.[21]S.Govindan,D.Wang,L.Chen,A.Sivasubramaniam,andB.Ur-gaonkar.TowardsRealizingaLowCostandHighlyAvailableData-centerPowerInfrastructure.InProceedingsoftheWorkshoponPowerAwareComputingandSystems(HotPower),2011.[22]HadoopMapReduce.http://hadoop.apache.org/mapreduce/.[23]J.Hamilton.Internet-scaleServiceInfrastructureEfciency,ISCAKeynote2009.[24]Lead-acidbatterycost.http://photovoltaics.sandia.gov/Pubs_2010/PV%20Website%20Publications%20Folder_09/Hanley_PVSC09%5B1%5D.pdf.[25]C.Lefurgy,X.Wang,andM.Ware.Server-LevelPowerControl.InProceedingsofInternationalConferenceonAutonomicComputing(ICAC),2007.[26]H.Lim,A.Kansal,andJ.Liu.Powerbudgetingforvirtualizeddatacenters.InProceedingsofthe2011USENIXconferenceonUSENIXannualtechnicalconference(USENIX),2011.[27]M.Lin,A.Wierman,L.L.H.Andrew,andE.Thereska.Dynamicright-sizingforpower-proportionaldatacenters.InProceedingsoftheIEEEInternationalConferenceonComputerCommunications(INFOCOMM),2011.[28]D.LindenandT.B.Reddy.HandbookofBatteries.McGrawHillHandbooks,2002.[29]LinuxAVGAntiVirus.http://free.avg.com/.[30]K.Ma,X.Li,M.Chen,andX.Wang.Scalablepowercontrolformany-corearchitecturesrunningmulti-threadedapplications.InPro-ceedingsoftheInternationalSymposiumonComputerArchitecture(ISCA),2011.[31]S.McCluer.APCWhitepaper30(Revision11):BatteryTechnologyforDataCentersandNetworkRooms:Lead-acidBatteryOptions,2005.[32]D.Meisner,C.M.Sadler,L.A.Barroso,W.Weber,andT.F.Wenisch.Powermanagementofonlinedata-intensiveservices.InProceedingsoftheInternationalSymposiumonComputerArchitecture(ISCA),2011.[33]MicrosoftRevealsitsSpecialityServers,Racks,2011.http://www.datacenterknowledge.com/archives/2011/04/25/microsoft-reveals-its-speciality-servers-racks/.[34]J.Moore,J.Chase,P.Ranganathan,andR.Sharma.Makingschedul-ingcool:Temperature-awareworkloadplacementindatacenters.InProceedingsoftheUsenixAnnualTechnicalConference(USENIX),2005.[35]S.Pelley,D.Meisner,P.Zandevakili,T.F.Wenisch,andJ.Under-wood.PowerRouting:DynamicPowerProvisioningintheDataCen-ter.InProceedingsofArchitecturalSupportforProgrammingLan-guagesandOperatingSystems(ASPLOS),2010.[36]E.Pinheiro,R.Bianchini,E.Carrera,andT.Heath.LoadBalancingandUnbalancingforPowerandPerformanceinCluster-BasedSys-tems.InWorkshoponCompilersandOperatingSystemsforLowPower(COLP),2001.[37]R.Raghavendra,P.Ranganathan,V.Talwar,Z.Wang,andX.Zhu.NoPowerStruggles:Coordinatedmulti-levelpowermanagementforthedatacenter.InProceedingsofArchitecturalSupportforProgrammingLanguagesandOperatingSystems(ASPLOS),2008.[38]L.RamosandR.Bianchini.C-Oracle:Predictivethermalmanagementfordatacenters.InproceedingsoftheInternationalSymposiumonHigh-PerformanceComputerArchitecture(HPCA),2008.[39]P.Ranganathan,P.Leech,D.Irwin,andJ.Chase.Ensemble-levelPowerManagementforDenseBladeServers.InProceedingsofInternationalSymposiumonComputerArchitecture(ISCA),2006.[40]W.Smith.TPC-W:BenchmarkingAnEcommerceSolu-tion.http://www.tpc.org/information/other/techarticles.asp.[41]SPECJBB2005:JavaBusinessBenchmark.http://www.spec.org/jbb2005/.[42]N.N.Taleb.TheBlackSwan:TheImpactoftheHighlyImprobable.RandomHouse,2007.[43]B.Urgaonkar,P.Shenoy,andT.Roscoe.ResourceOverbookingandApplicationProlinginSharedHostingPlatforms.InProceedingsoftheSymposiumonOperatingSystemsDesignandImplementation(OSDI),2002.[44]R.Urgaonkar,B.Urgaonkar,M.J.Neely,andA.Sivasubramaniam.OptimalPowerCostManagementUsingStoredEnergyinDataCen-ters.InProceedingsoftheConferenceonMeasurementandModelingofComputerSystems(SIGMETRICS),2011.[45]A.Verma,P.De,V.Mann,T.Nayak,A.Purohit,G.Dasgupta,andR.Kothari.Brownmap:Enforcingpowerbudgetinshareddatacen-ters.InProceedingsoftheConferenceonMiddleware(MIDDLE-WARE),2010.[46]C.Waldspurger.MemoryResourceManagementinVMWareESXServer.InProceedingsoftheSymposiumonOperatingSystemDesignandImplementation(OSDI),2002.[47]X.WangandM.Chen.Cluster-levelfeedbackpowercontrolforper-formanceoptimization.InProceedingsoftheInternationalSympo-siumonHigh-PerformanceComputerArchitecture(HPCA),2008.[48]X.Wang,M.Chen,andC.Lefurgy.Howmuchpoweroversubscrip-tionissafeandallowedindatacenters?InProceedingsoftheInter-nationalConferenceonAutonomicComputing(ICAC),2011.[49]A.WeiselandF.Bellosa.Processcruisecontrol-event-drivenclockscalingfordynamicpowermanagement.InProceedingsofCompilers,ArchitectureandSynthesisforEmbeddedSystems(CASES),2002.[50]H.Zeng,X.Fan,C.Ellis,A.Lebeck,andA.Vahdat.ECOSystem:ManagingEnergyasaFirstClassOperatingSystemResource.InPro-ceedingsoftheConferenceonArchitecturalSupportforProgrammingLanguagesandOperatingSystems(ASPLOS),2002.[51]F.Zhang,Z.Shi,andW.Wolf.Adynamicbatterymodelforco-designincyber-physicalsystems.InProceedingsoftheInternationalConferenceonDistributedComputingSystemsWorkshops(ICDCSW),2009.[52]Q.Zhu,Z.Chen,L.Tan,Y.Zhou,K.Keeton,andJ.Wilkes.Hiber-nator:helpingdiskarrayssleepthroughthewinter.InProceedingsoftheSymposiumonOperatingSystemsPrinciples(SOSP),2005.