SharedNothingDistributedDatabasesfor ScalableCloudApplications JoarderMohammadMustafaKamal FacultyofInformationTechnology MonashUniversity VictoriaAustralia Emailjoarderkamalmonashedu ManzurMurs ID: 608667
Download Pdf The PPT/PDF document "Workload-AwareIncrementalRepartitioningo..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Workload-AwareIncrementalRepartitioningof Shared-NothingDistributedDatabasesfor ScalableCloudApplications JoarderMohammadMustafaKamal FacultyofInformationTechnology MonashUniversity Victoria,Australia Email:joarder.kamal@monash.edu ManzurMurshed FacultyofScienceandTechnology FederationUniversity Victoria,Australia Email:manzur.murshed@federation.edu.au RajkumarBuyya Dept.ofComp.andInformationSystems UniversityofMelbourne Victoria,Australia Email:rbuyya@unimelb.edu.au Abstract Cloudapplicationsoftenrelyonshared-nothing distributeddatabasesthatcansustainrapidgrowthindata frommultiplegeo-distributedserverscanadverselyimpactthe performanceofsuchdatabases,especiallywhenthetransactions areshort-livedinandrequireimmediateresponse.The k -way min-cutgraphclusteringalgorithmhasbeenfoundeffective toreducethenumberofDTswithacceptablelevelofload balancing.Benetsofsuchastaticpartitioningscheme,however, isshort-livedinCloudapplicationswithdynamicallyvarying workloadpatternswhereDTprolechangesovertime.Thispa- peraddressesthisemergingchallengebyintroducingincremental repartitioning.Ineachrepartitioningcycle,DTproleislearnt onlineand k specialsub-graphrepresentingallDTsaswellasthosenon-DTs thathaveatleastonetupleinaDT.Thelatterensuresthatthe min-cutalgorithmminimallyreintroducesnewDTsfromthenon- DTswhilemaximallytransformingexistingDTsintonon-DTsin thenewpartitioning.Potentialloadimbalanceriskismitigated byapplyingthegraphclusteringalgorithmonthenerlogical partitionsinsteadoftheserversandrelyingonrandomone-to-one cluster-to-partitionmappingthatnaturallybalancesoutloads. Inter-serverdata-migrationduetorepartitioningiskeptincheck withtwospecialmappingsfavouringthecurrentpartitionof majoritytuplesinaclusterthemany-to-oneversionminimising migrationwithoutaffectingloadbalancing.Adistributeddata lookupprocess,inspiredbytheroamingprotocolinmobilenet- works,isintroducedtoefcientlyhandledatamigrationwithout affectingscalability.Theeffectivenessoftheproposedframework isevaluatedonrealisticTPC-Cworkloadscomprehensivelyusing graph,hypergraph,andcompressedhypergraphrepresentations usedintheliterature.Simulationresultsconvincinglysupport incrementalrepartitioningagainststaticpartitioning. Keywords Clouddatabases;workload;distributedtransac- tions;incrementalrepartitioning;load-balance;datamigration; I.I NTRODUCTION Nowadays,electronicdataarebeinggeneratedinanun- ine-commerce,onlinebusinessprocessing,digitalmedia,and socialnetworks.Itisestimatedthat2.3trilliongigabytesof digitiseddataaregeneratedeverydayaroundtheglobe[1]. Asanexample,inanaverageday,over30billionpiecesof contentsaresharedinFacebookwhile4billionhoursofvideos arewatchedinYouTube[1].Inrecentyears,suchInternet- scaleWebapplicationsscale-outinstantaneouslyusingCloud computingtechnologies.Shared-nothingdistributeddatabases incombinationwithhorizontaldatapartitioning,providea keymechanismtohandlethismassivedataexplosionand toscaletobillionsofconcurrentusers.Unfortunately,tra- ditionalapproachescanhardlyadoptthedynamicworkload balanceoperationswithinageo-distributedcluster[2].With thedynamicnatureofuser-facinginteractiveWebapplications drivingOnlineTransactionProcessing(OLTP)workloads,its simplynotpossibleforastaticpartitioningandplacement modeltoworkeffectivelybyonlyaddingmoreserversand harddiskstothecluster.Bynature,OLTPtransactionsare small-sizedandshort-livedwithanimmediateresponsetime requirement.Atthesametime,withinapartitioneddatabase, DTsoccurfrequentlyandspanacrossmultipleserverscreating unscalablecommunicationsintransactionprocessing[3].In addition,toadoptdynamicworkloads,large-scaledatamigra- tionsarerequiredinvolvingsignicantcostintermsofI/O, databaseresources,andpotentialdowntime. partitioning[4],[5]monitorthetransactionallogsandperiod- icallycreateworkloadnetworksusinggraphorhypergraph representation.Eachedgeinaworkloadgraphconnectsa pairoftuplesoriginatedfromthesametransactionwhereas ahyperedgeconnectsalltupleswithinatransactionina hypergraph.Later,theseworkloadrepresentationsareclustered using k -waymin-cutclustering,andthenrandomlyplaced acrossthesetofphysicalserverswithinadatabasecluster.As longasworkloadcharacteristicsdonotchangedramatically, andtuplesfromaclusterstaytogetherinaphysicalserver, theoccurrencesandadverseimpactsofDTsarereduced rapidly.Anumberofcentraliseddatalookupandrouting redistribution.Large-scaleOLTPserviceprovidersdevelop partitionmanagementsolutionslikeYouTubesVitess[6], TumblrsJetPants[7],andTwittersGizzard[8]todealwith rapiddatagrowth.Nonetheless,theunderlyingdataplacements arenottransparenttoapplicationcodes,andredistributionsare notawareofworkloaddynamics.Furthermore,noneofthese techniquesprovideanyexplicitwaytominimisephysicaldata migrationsoverWAN,andgloballoadbalanceatthesame 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing978-1-4799-7881-6/14 $31.00 © 2014 IEEEDOI 213 timeforageo-distributedshared-nothingdatabasecluster. Inthispaper,wepresentaproactiveworkload-awarein- crementalrepartitioningframeworkwhichtransparentlyredis- tributesdatabasetuplestoensureminimumdatamigrations andgloballoadbalance.Transactionallogsarecollectedperi- odicallythenundergoapre-processingandclassicationstage beforegeneratingworkloadnetworksformin-cutclustering. Auniquetransactionclassicationprocessisintroducedto identifypurelydistributedtransactionsandnon-distributed onescontaining moveable datatuplesthatarealsocontained inaDT.Thisnovelclassicationremovestheshortcomings ofselectiveswappingoftuplesetsforlocalloadbalancing byextendingthesizeofworkloadnetwork,andoverthetime reducestheimpactofDTsensuringglobaloptimisationinboth loadbalanceanddatamigrations.Wealsoperformadetail sensitivityanalysisbyrepresentingtheworkloadnetworksin ne,exact,andcoarsegranularityusinggraphs,hypergraphs, andcompressedhypergraphs.Incontrarytopreviousworks, axednumberofclustersarecreatedfromtheworkload networkforthetotalnumberoflogicaldatapartitionsinthe entiredatabaseinsteadofthenumberofphysicalservers.This providesnercontrolinloadbalanceoverthesetofboth partitionsandservers.Wealsoavoidtuple-levelreplications toobservethequalityofincrementalrepartitioningunder worst-casescenarioofDTs.Wealsoproposetwoinnovative cluster-to-partitionmappingstrategiesthatcaterforminimising bothphysicaldatamigrationsanddistributionimbalance.Our distributeddatalookupmechanismensureshigh-scalability, andguaranteesamaximumoftwolookupstolocateatuple withinthepartitioneddatabase. Toevaluatethequalityofincrementalrepartitioning,we deviseasetofmetricsandalsoprovideawaytoadmin- istrativelydirectaparticularrepartitioningobjectiveusinga compositemetric.Finally,wecomparethequalityofthe proposedincrementalrepartitioningframeworkagainstastatic partitioningcongurationsimilarto[4]implementingrandom one-to-onecluster-to-partitionmappingwithdifferentwork- loadrepresentations.Morespecically,wecompare12dif- ferentdatabasecongurationswithdifferentsettings 3work- loadrepresentationsand3mappingstrategiesforstaticand incrementalrepartitioning.Oursimulationbasedexperimental resultsusingrealisticTPC-Cworkloadsignifythetrade-offs betweendifferentrepartitioningapproacheswhileshowing theclearshortcomingsofastaticpartitioninginachieving dynamicdataredistributionsforOLTPdatabases. Themaincontributionsaresummarisedinbelow: Investigatingpossibledesignchoicesforworkload networkrepresentationsandtheirapplicability. Proposingaproactivetransactionclassicationtech- niquethatidentiesDTsandmoveablenon-DTsto createworkloadnetworks. Presentingtwocluster-to-partitionmappingstrategies thatensureminimuminter-serverdatamigrationsand loadimbalanceacrosspartitionsandservers. Developingascalabledistributeddatalookuptech- niquethatrequiresamaximumoftwoI/Oroundtrips tolocateadatatuplewithintheentiredatabase. Devisingasetofqualitymetricsfortheincremental repartitioningprocessdeningdifferentobjectives. Theremainderofthispaperisorganisedasfollows:we reviewtherelatedworksinbriefinSectionII;ahigh-level overviewoftheproposedframeworkisdiscussedinSection III;SectionIVdetailsthesteps,formulations,anddesign philosophieswithnecessaryillustrations;SectionVdiscusses theexperimentalresultscomparingtoastaticpartitioning framework;andnallySectionVIconcludesthepaper. II.R ELATED W ORK Workload-awareloadbalancewithI/Ooverheadminimi- sationindistributeddatabasesystemswasstudiedbefore forndingoptimaldataplacementstrategyinshared-nothing paralleldatabases[9].RecentworksprimarilyfocusonOLTP workloadsforscaling-outtheCloudapplicationstominimise thenumberofDTs.Workload-awaredatareplicationand partitioningapproachisrstintroducedby[4]forOLTP databases.TheauthorsproposedSchismwhichrepresents thetransactionalworkloadasagraph,andperforms k -way replicatedgraphpartitioningtominimisetheeffectofDTs. However,Schismusuallygeneratesverylargegraphs,does notdealwithdynamicworkloadchanges,andthemoregen- eralproblemofrepartitioning.Transactionalworkloadsare modeledascompressedhypergraphin[5]byhashingdata tuplesprimarykeytoreducetheoverheadof k -wayclustering. TheauthorsproposeSWORD,anincrementalrepartitioning techniquewhichmovesaxedamountofdatainaregular intervaluponnotifyingworkloadchanges,andbyobserving theincreaseinthepercentageofDTsfromapredened threshold.However,thisreactiveapproachonlyensureslocal loadbalance,anddoesnotalwaysguaranteesreductioninDTs. Duetotheselectiveswappingoftherandomlycompressed tuplesetsandnewlytransformedDTs,thequalityofmin- cutclusteringmaylost,andgraduallyleadtoglobaldata distributionimbalance.In[10],anotherautomaticworkload- awaredatabasepartitioningmethodisproposedalongwithan analyticalmodeltoestimateskewandcoordinationcostfor DTs.Itusesthesamegraphbasedworkloadrepresentation of[4],andprimarilyfocusesonoptimaldatabasedesign basedonworkloadcharacteristics.However,itdidnotconsider incrementalrepartitioning. Elascaisproposedin[11],whereamulti-object workload-awareonlineoptimiserisdevelopedforoptimalpar- titionplacementensuringminimumdatamovement,however itdoesnotsupportincrementalrepartitioning.Adistributed lookupmethodfortransactionaldatabasesrequiringspecial knowledgenodesforcoordinationisproposedin[12],how- everitmayperformincorrectroutingduetoinconsistent values.Incontrast,ourproposeddistributedlookupoperation isbasedonthewellknownconceptof roaming [13],and italwaysguaranteesconsistentresultswithamaximumof twolookups.In[14],aSocialPartitioningandReplication middleware (SPAR)isproposedthatexploresthesocialnet- workgraphfromuserinteraction,andthenperformsjoint partitioningandreplicationtoensurelocaldatasemanticsfor theusers.Similarly,in[15],temporalactivityhypergraphsare usedtomodeluserinteractionsinsocialnetwork,andthen min-cutclusteringisusedtominimisetheimpactofDTswith minimumloadimbalance.However,noneofthesetechniques exploretheincrementalrepartitioningproblem,andtheeffect ofdatamigrationsingloballoadbalance. 214 C 1 C 2 C 3 C 4 P 1 1 1 2 0 P 2 1 0 0 1 P 3 0 0 1 0 P 4 3 0 0 0 (3c) Classifying Transaction Generate workload networks (3b) Parsing and explaining SQL queries (3a) Process transactional logs 1 71 23 9 1 2 43 12 33 47 1 3 71 9 1 . N 43 23 1 67 3 GTID- 1 { select , insert , delete , update ...}; GTID- 2 ; GTID- 3 ; ; GTID- N Extract primary keys from transactions (3e) Perform k -way balanced clustering (3f) Mapping Clusters-to-Partitions Heuristics Constraints P 1 d 9, d 33 , d 23 , d 71 P 2 d 1 P 3 P 4 d 3 , d 12 , d 43 , d 47 , d 67 Cluster-to-Partition mapping matrix with da ta tuple counts (3g) Generate data tuple migration plans Migration plan Cluster 1 Cluster 4 71 67 1 43 12 33 9 3 47 23 Cluster 2 Cluster 3 Database and System Load Statistics Data Node Data Node Data Node Data Node Logical Data Partition Data Migration Plan Cloud App. Data Tuple Migrator Workload Analyser (6) Data Migration Distributed Transaction Manager ODBC Driver Partition Management Location Catalogue Load Balancer Coordinator Nodes Transaction Streams Analyser Node (1) (2) (3) (5) Client (GTID = Global Transaction ID) Workload-Aware Incremental Repartitioning Roaming Data Tuples in a Foreign Partition Data Tuple in a Home Partition (3d) Representing workloads Graph Hypergraph Minimum edge-cut Identify movable distributed transactions Workload Specific Location Cache (4) Feed data tuples partition and node location ....... Location Update ODBC Driver ODBC Driver ODBC Driver ODBC Manager Cloud Region A Cloud Region B Fig.1.Anoverviewoftheworkload-awareincrementalrepartitioningframeworkusingnumberednotationsfrom1-6representingtheoverallworkow.Steps 3ato3grepresenttheowofworkloadanalysis,representation,clustering,andrepartitioningdecisiongeneration. III.P ROPOSED S YSTEM O VERVIEW Anoverviewoftheproposedframeworkisshownin Figure1.Weassumeasetof coordinator nodesservingclients requests,andmanagetheexecutionsoftransactionalqueries. Coordinatorsareconnectedwithasetofgeo-distributed data nodeswherethelogicalpartitionsreside.Eachlogicalpartition containsalocationcataloguewheretheresidingtuplesloca- tionsandtheircurrentpartitionidsarepersistedaskey-value pairs.Notethat,individualdatanodescanbesynchronously replicatedasmaster-slavewithinindependentgroupstoensure highavailabilitywhichisacommondeploymentpractice. Thus,inthisworkwedonotexplicitlyhandletuplelevel replicationlike[4].Coordinatorsalsoadministerpartition managementoperations(likesplit,merge,andmigration)and incomingread/writeworkloadbalance.Streamsoftransac- tionallogsarecontinuouslypulledbythe analyser node,and pre-processforanalysiseitherinatimeorworkloadsensitive window.Analysernodecanalsocachethemostfrequently appearedtuplelocationinaworkload-speciccataloguewhich iskeptupdateduponinter-partitiondatamigrations.Following Figure1,theinputoftheworkload-awareincrementalreparti- tioningcomponent(indottedrectangle)istransactionallogs, andtheoutputisapartition-leveldatamigrationplan.The overallprocesshasfourprimarysteps: Pre-processing,Parsing,andClassication. Clientap- plicationssubmittransactionalqueriesinstep1,whichis thenprocessedbya distributedtransactioncoordinator that managesthedistributeddatanodes.Uponpullingthestreams oftransactionalworkloadsinstep2,individualtransactionsare processedtoextractthecontainedSQLstatementsatstep3a. ForeachSQLstatement,theprimarykeysofindividualtuples areextracted,andcorrespondingpartitionidsareretrieved fromtheembeddedworkloadspeciclocationcataloguein step3b.Intheclassicationprocess(3c),originalDTand moveablenon-DTsareidentiedalongwiththeirfrequency countsinthecurrentworkload,andtheirassociatedcostsof spanningmultipleservers. WorkloadRepresentationand k -wayClustering. In step3d,workloadnetworksaregeneratedfromtheextracted transactionallogsgatheredinthepreviousstepusinggraph orhypergraph.Tuple-levelcompressioncanfurtherreducethe sizeofworkloadnetwork.Sincetransactionalgraphscannot fullyrepresenttransactionswithmorethantwotuplesusing pair-wiserelationship,wecannotdirectlyminimisetheimpact ofDTsintheworkload.However,graphrepresentationsare muchsimplertoproduce,anditadoptedwiderangesof applicationspecicusagesthatalsohelpustounderstand itsimportanceincreatingworkloadnetworks.Ontheother hand,hypergraphscanexploitexacttransactionalrelationships, thusthenumberofhyperedgecutsexactlymatchesthe numberofDTs.Yet,popularhypergraphclusteringlibraries arecomputationallyslowerthanthegraphclusteringlibraries, andproducelesseffectiveresults[4]. Inreality,withtheincreaseinsizeandcomplexity,both oftheserepresentationsarecomputationintensiveinma- nipulation.Furthermore,compressiontechniquescanconne analgorithmwithinaspeciedtarget,dramaticdegradation inclusteringqualityandoverallloadbalanceoccurwitha highcompressionratio[5].Finally,workloadnetworksare clusteredusing k min-cutclusteringemployedbythegraph andhypergraphclusteringlibrariesinstep3e. Cluster-to-PartitionMapping. Atstep3f,amapping matrixiscreatedwiththecountsfortuplesthatareplaced inthenewlycreatedclusterandoriginatedfromthesame partitionasthematrixelement.Theproducedclustersfromthe 215 min-cutclusteringarethenmappedtotheexistingsetoflogical partitionsbyfollowingthreedistinctstrategies.Atrst,we employuniformrandomtupledistributionformappingclusters todatabasepartitionswhichnaturallybalancesthedistribution oftuplesoverthepartitions.However,thereisnoproactive considerationinthisrandomstrategyforminimisingdata migrations.Thesecondstrategyemploysastraightforward butoptimalapproach.Itmapsaclustertoarespectivepartition whichoriginallycontainsmaximumnumberoftuplesfromthat cluster,henceminimumphysicaldatamigrationstakeplace. Inmanycases,thissimplestrategyturnsouttobemany- to-onecluster-to-partitionmapping,anddivergesuniformtuple distribution.Again,incrementalrepartitioningcancreateserver hot-spotassimilartransactionsfromnewworkloadbatches willalwaysdrivemorenewtuplestomigrateintoahotserver. Asaconsequence,overallloadbalancedecreasesovertime, whichisalsoobservedinourexperimentalresults.Away torecoverfromthissituationisbyensuringthatcluster-to- partitionmappingremainsone-to-one,whichisusedasthe thirdstrategy.Thissimple,yeteffective,schemerestoresthe originaluniformrandomtupledistributionwiththeconstraint ofminimisingdatamigrations.Finally,instep3g,based ondifferentmappingstrategiesandappliedheuristicsadata migrationplanisgenerated,andthenforwardedtothedata tuplemigratormoduleinstep5. DistributedLocationUpdateandRouting. Theanalyser nodekeepsaworkloadspeciclocationcatalogueforthe mostfrequentlyaccessedtuples,andupdatestheassociated locationsateachrepartitioningcycleinstep4.Theanalyser alsodirectlyinvokesthecorrespondingdatanodestoperform datamigrationsinstep6withoutinterruptingtheongoing transactionalservices.Untilatuplefullymigratestoanew partition,itsexistingpartitionservesallthequeryrequests. Distributeddatabasesusingrangepartitioningrequirekeeping acentrallookuptablefortheclientstoretrievetuples.Hash partitioningrequirestheclienttouseaxedhashfunctionto lookuptherequiredtuplesinthespeciedserver.Consistent hashpartitioning[16]employsdistributedlookupmechanism usingdistributedhashtable.However,noneoftheseparti- tioningschemesprovidescalabledatalookupmechanismsfor successivedataredistribution. Tosolvethisproblem,weusethewellestablishedconcept of roaming fromwirelesstelecommunicationsandcomputer datanetworks.Theproblemoflocationindependentrouting isalreadysolvedinIPv6usingMobileIP[17],andinGSM networksusingroamingmobilestations[13].Inasimilarway, theattachedlocationcataloguewithineachdatapartitionkeeps trackofthe roaming tuplesandtheircorresponding foreign partitions.Amaximumoftwolookupsarerequiredtonda tuplewithoutclient-sidecaching.Withpropercachingenabled, thislookupcostcanbeevenamortisedtooneformostofthe caseswithhighcashhit. IV.W ORKLOAD -A WARE I NCREMENTAL R EPARTITIONING A.ProblemFormulation Let, S = { S 1 ,...,S n } bethesetof n shared-nothing physicaldatabaseserverswhereeach S i = {P i, 1 ,..., P i,m } denotesthesetof m logicalpartitionsresidein S i .Again,let, P i,j = { d i,j, 1 ,...,d i,j, | P i,j | } denotesthesetoftuplesreside TABLEI.S AMPLE D ATABASE :P HYSICALAND L OGICAL L AYOUT Servers Partitions S 1 (10) P 1 :(5)= { 2,4,6,8,10 } P 3 :(5)= { 12,14,16,18,20 } S 2 (10) P 2 :(5)= { 1,3,5,7,9 } P 4 :(5)= { 11,13,15,17,19 } in P i,j .Wecanthusgettheamountoftuplesresidein S i as D S i = j P i,j .Finally, D S = i DS i denotesthetotal amountoftuplesintheentirepartitioneddatabase. Let, W = {W 1 ,..., W } bethesetofworkloadbatches withinthedatabaselifetime .Each W i representsaparticular workloadbatchatthe i thtickof where = i i .Theset oftransactionsinany W i isrepresentedby T = { t 1 ,...,t z } , andcanbeeithercharacterisedasdistributed( T )ornon- distributed( T ),thus T = T T and T T = where T = { t , 1 ,...,t , | T | } and T = { t , 1 ,...,t , | T | } .Again,any distributedornon-distributedtransaction t ,i or t ,i canoccur multipletimewithin W i ,hence,itsfrequencycanberepre- sentedbyeither freq ( t ,i ) or freq ( t ,i ) .Asany t ,i canspan multipleservers,wedenethecostofspanningas cost ( t ,i ) . Weconsiderthecostofspanningmultiplepartitionsbya transactionwithinaservernegligibleintermsI/Ooverhead. Letsdenetheproblemofincrementalrepartitioningas: ProblemDenition :Foragiventransactionalworkload W i at i thobservation, S homogeneousserverscontainingtotal P logicalpartitions,andamaximumallowedimbalanceratio , ndanincrementalrepartitioningsolution X i fromtheoutput ofa k -waybalancedclustering whichminimisesthemean impactofDTsin W i andimbalancein D S acrosspartitions andservershavingminimuminter-serverdatamigrations. Inthefollowing,weuseillustrativeexamplesusinga simpledatabaseconstructionwith20datatuplesdistributed usinghash-partitioningover4logicalpartitionsand2physical serversasshowninTableI.Asampleworkloadbatchwith 7transactionsandcorrespondingdatatuplesarealsoshown inTableII.Finally,adetailillustrationonhowthecluster- to-partitionmappingstrategiesworkwithdifferentworkload representationsisshowninFigure3. B.WorkloadModelling Wemodeltheworkloadnetworksusingthreedistinctrep- resentations.Firstly,graphrepresentation( GR )producesne- grainworkloadnetworkalthoughitisunabletofullycapture theactualtransactionalrelationshipbetweendifferenttuples. Yet,graph min-cut processcanstillgeneratehighquality k - wayclusteringandminimisestheimpactofDTs,unlessthe overallgraphsizeincreaseswithworkloadvariability,and adequatelevelofsamplingisperformed[4].Secondly,hyper- graphrepresentation( HGR )generatesmostaccurate,andexact workloadnetworksthusalsoabletoproducebalancedclus- terswithmin-cuthypergraphclustering.Moreover,fromour empiricalstudieswefoundthat, k -waymin-cutbalancedhy- pergraphclusteringproducesmoreconsistentresultsinterms ofachievingtherepartitioninggoals,andisalsomentioned in[15].Finally,compressedhypergraphrepresentation( CHG ) producescoarse-grainworkloadnetworksdependingonthe compresslevel.Withlowerlevelofcompression,lesscoarse 216 At least one data tuple resides in a distributed transaction Transactions Non-distributed Distributed Non- moveable Moveable Otherwise At least two data tuples in different servers Otherwise Fig.2.TransactionclassicationidentifyingDTsandmoveablenon-DTs. networksaregeneratedand k -wayclusteringperformsbetter. However,asshownin[5],asthelevelofcompressionincreases thequalityoftheclusteringprocessdegradesdramatically.We formallydenetheindividualrepresentationsasinbelow: 1)GraphRepresentation: Agraph G =( V ,E g ) represents W i whereeachedge e g E g linksapairoftuples ( v x ,v y ) from V = { v 1 ,...,v |V| }D S foratransaction t i where v i = a b cd a,b,c .Individualtuplesfrom ( v x ,v y ) connectstotheir respectivesetofadjacenttuples A v x and A v y originatedfrom thesame t i .Anyedgewithin t i hasaweightrepresentingthe frequencyof t i in W i whichco-accessthepair ( v x ,v y ) ,while vertexweightrepresentsthetuplessize(involume). 2)Hypergraphrepresentation: Ahypergraph, H = ( V ,E h ) represents W i whereahyperedge e h E h charac- terisesatransaction t i andoverlaysitscontainedsetoftuples V t i V .Ahyperedgerepresenting t i isassociatedwitha weightdenotingthefrequencyof e h within W i anditsvertices weightrepresentdatatuplessize(involume). 3)CompressedHypergraphrepresentation: Ahypergraph, H =( V ,E h ) canbecompressedbycollapsingthever- ticestoasetofvirtualvertices V usingasimplehash functionontheprimarykeys[5].Acompressedhypergraph H c =( V ,E h ) represents W i whereeachvirtualhyperedge e h E h constitutesthesetofvirtualvertices v e h V where theoriginalverticesof e h aremappedintoand | v e h | 2 . Virtualvertexweightrepresentsthecombineddatavolume sizesofthecorrespondingcompressedtuples.Andhyperedge weightrepresentsthefrequencyoftransactionswhichaccess thecorrespondingvirtualvertices. C l denotesthecompression levelas |V| / |V | andequalsto1fornocompressionwhileto |V| forfullcompression. Figure3presentstheworkloadnetworksasgraph,hyper- graph,andcompressedhypergraph(with C l =0 . 5 )forthe transactionslistedinTableII. C.ProactiveTransactionClassication Inconstructingtheclassicationtechnique,wearguethat therealwaysexistsagroupoftupleswhichareretrievedwhile processingtheDTs,andalsoparticipatedintheexecution ofnon-distributedbutfrequentlyoccurredtransactions.These particulargroupsoftupleswhenmoveintodifferentdatabase serversduetothedatabaserepartitioningprocesscanturn thepreviouslynon-DTsintonewlydistributedones.Weuse thisintuitivetoclassifytheworkloadtransactionsintothree TABLEII.S AMPLE W ORKLOAD Transaction DataTuples Class T 1 { 1,4,5,6,7,8,10 } DT T 2 { 1,4,6,9,11 } DT T 3 { 9,15,17 } MoveableNon-DT T 4 { 9,17 } MoveableNon-DT T 5 { 5,7,18 } DT T 6 { 15,17 } Non-moveableNonDT T 7 { 2,14,16 } Non-moveableNonDT differentcategories distributed,non-distributedmoveableand non-distributednon-moveableasshowninFigure2.Asan example,transactions T 1 , T 2 ,and T 5 fromthesampleworkload ofTableIIareidentiedas distributed ,whereas T 3 and T 4 are labelledas moveablenon-distributed .Finally, T 6 and T 7 are discardedaspurely non-distributed transactions. Clearly,anumberofnon-distributedmoveabletransactions willberemainprotectedwithin k -wayclusteringasthe min- cut clusteringalwaystriestopreserveasmuchastransactional edgesitcould.Asthetuplesinthesemoveabletransactions didnotparticipateintoanyDTs,theyareresidinginisolation withintheworkloadnetwork.Thus,theyarehighlylikelytobe preservedtogetherinthesameclusterafter k -wayclustering. AsanexampleshowninFigure3,thenon-distributedmove- abletransactions T 3 and T 4 containingtupleswithid9,15, and17remainprotectedasnon-distributedafterperforming k - wayclusteringwithallofthreeworkloadrepresentationsusing Metis[18]andhMetis[19]libraries. IfweaddedtheDTs T 1 , T 2 ,and T 5 intheworkloadsub- graphs,thenatthenextincrementalrepartitioningphase T 3 and T 4 wouldhavebeenappearedasDT.Since,tuplewithid9, whichbythistimewouldhavebeenalreadymovedtoanother partitionlocatedinadifferentphysicalserver,wouldcauseits associatedtransactionstobecomedistributed.Thereexistsa cleartrade-offbetweentheincreaseofsizeoftheworkload networksandachievedbenets.Atoneend,thesmalleris theworkloadnetwork,itwilllesscomputationallycostlyto processwithrespecttotimeandI/O.Ontheotherhand, ifweincludealltheworkloadtuplesintherepresentations, itmayreducetheimpactofDTbetterthaninaparticular repartitioningcycle,butwiththepriceofunwanteddata migrationstocreatenewDTs.Byaggressivelyclassifying the non-distributedmoveable transactions,thequalityofthe overallrepartitioningprocessincreasesastheimpactofDTs decreasescomparingtoastaticpartitioningstrategyasshown laterinourexperimentalresults. D. k -wayBalancedClusteringofWorkload Given G andamaximumallowedimbalanceratio ,we candenetheproblemasndthek-wayclustering G = { V 1 ,...,V k } thatminimisestransactional edgecut withthe balance constraintboundsby (1+ ) .Similarly,the k -way constrainedandbalancedclusteringof H is H = { V 1 ,...,V k } suchthatminimumnumberofhyperedgesarecuthavingthe imbalanceratio .Analogously,the k -waybalancedclustering of H c is H c = { V 1 ,...,V k } withanimbalanceratio aiming atminimumvirtualhyperedgecuts.Notethat,wedenote k as thetotalnumberoflogicalpartitionsinsteadofthenumber ofphysicalservers.Fromourempiricalexperimentswend 217 C 1 C 2 C 3 C 4 P 1 0 2 2 0 P 2 1 1 0 2 P 3 0 0 0 1 P 4 2 1 0 0 C 1 C 2 C 3 C 4 P 1 0 2 2 0 P 2 1 1 0 2 P 3 0 0 0 1 P 4 2 1 0 0 C 1 C 2 C 4 C 3 P 4 2 1 0 0 P 1 0 2 0 2 P 2 1 1 2 0 P 3 0 0 1 0 C 1 C 2 C 3 C 4 P 1 2 2 0 0 P 2 1 2 1 0 P 3 0 0 0 1 P 4 0 0 2 1 C 1 C 2 C 3 C 4 P 1 2 2 0 0 P 2 0 2 1 0 P 3 1 0 0 1 P 4 0 0 2 1 C 1 C 2 C 3 C 4 P 1 2 2 0 0 P 2 0 2 1 0 P 4 0 0 2 1 P 3 1 0 0 1 C 1 C 2 C 3 C 4 P 1 0 2 1 0 P 2 1 0 2 2 P 3 0 0 1 0 P 4 0 0 0 3 C 1 C 2 C 3 C 4 P 1 0 2 1 0 P 2 1 0 2 2 P 3 0 0 1 0 P 4 0 0 0 3 C 4 C 2 C 3 C 1 P 4 3 0 0 0 P 1 0 2 1 0 P 2 2 0 2 1 P 3 0 0 1 0 S 1 ( 11) P 1 (4) P 3 (7) S 2 ( 9) P 2 (5) P 4 (4) 7 7 S 1 ( 13) P 1 (8) P 3 (5) S 2 ( 7) P 2 (2) P 4 (5) 3 5 S 1 ( 9) P 1 ( 4 ) P 3 ( 5 ) S 2 ( 11) P 2 ( 6 ) P 4 ( 5 ) 3 5 S 1 ( 10) P 1 (4) P 3 (6) S 2 ( 10) P 2 (5) P 4 (5) 6 11 S 1 ( 11) P 1 (7) P 3 (4) S 2 ( 9) P 2 (4) P 4 (5) 3 4 S 1 ( 11) P 1 ( 5 ) P 3 ( 6 ) S 2 ( 9) P 2 ( 4 ) P 4 ( 5 ) 3 6 S 1 ( 11) P 1 (3) P 3 (8) S 2 ( 9) P 2 (2) P 4 (7) 5 8 S 1 ( 8) P 1 (4) P 3 (4) S 2 ( 12) P 2 (5) P 4 (7) 2 4 S 1 ( 9) P 1 ( 4 ) P 3 ( 5 ) S 2 ( 11) P 2 ( 4 ) P 4 ( 7 ) 3 5 Random MC MSM Random MC MSM Random MC MSM 1 6 4 9 11 17 15 5 7 18 10 8 C 1 C 2 C 3 C 4 e 1 e 2 e 4 e 3 e 5 Hypergraph Graph 1 1 1 9 17 15 C 3 6 4 11 5 C 1 10 8 7 18 C 2 C 4 Compressed Hypergraph 8 v' 3 e' 1 C1 4 10 v' 5 C 2 e' 3 11 17 5 v' 6 9 15 v' 4 C 4 e' 2 e' 4 1 7 v' 2 6 18 v' 1 C 3 GR-R GR-MC GR-MSM HGR-R HGR-MC HGR-MSM CHG-R CHG-MC CHG-MSM 9 (Nine) Resulting Database Layouts Cluster-to-Partition Mapping Matrices Inter-server data migrations Inter-partition data migrations Fig.3.Transactionalworkloadmodellingwith3representationsalongwith4-waymin-cutclustering,followedby3cluster-to-partitionmappingstrategies. thatexecutingthe k -wayclusteringprocessingwith k asthe numberofpartitionsprovidenergranularityinbalancingthe distributionofdatavolumeoverthesetofphysicalservers. The k -waybalancedclusteringgeneratesclustersofsimilar sizewithrespecttothenumberoftuplesgivena balance constraintwhichisdenedas k max( W ( V i )) /W ( V ) ,andtells whethertheclustersareequally-weightedornot.Here, W ( V i )) isthesumoftheweightsoftheverticesin V i .Thepartitions aresaidtobebalancedifthe balance measureisequalstoor closeto1otherwiseimbalancedifgreaterthan1. E.Cluster-to-PartitionMappingStrategies Figure3presentsthreedistinctivecluster-to-partitionmap- pingstrategies(inmatrixformat)beneaththeirrespective workloadnetworkrepresentations.Therowsandcolumnsof thematricesrepresentpartitionandclusteridrespectively. Individualmatrixelementrepresentstuplecountsfroma particularpartitionwhichisplacedbytheclusteringlibraries underaspecicclusterid.Theshadowedlocationsinthe mappingmatrixwiththecountsin boldface representsthe resultingdecisionblockwithrespecttotheparticularcluster andpartitionid.Individualtablesbelowthematricesrepresent thestateofthephysicalandlogicallayoutsofthesample database.Thelastrowofthesetablesrevealsthecountsof inter-andintra-serverdatamigrationsforeachofthesenine representativedatabaselayouts.The boldface numbersinthe layouttablesatbottomdenotemostbalanceddistributionand leastcountfordatamigrations.Inbelowweexplainthemain philosophiesbehindthesemappingstrategiesindetail. 1)Random(R)Cluster-to-PartitionMapping: Naturally, thebestwaytoachieveloadbalanceinanygranularityis toassigntheclustersrandomly.ClusteringtoolslikeMetis andhMetisrandomlygeneratestheclusterids,anddonot haveanyknowledgeabouthowthedatatuplesareoriginally distributedwithintheserversorpartitions.Asastraightforward approach,theclusteridscanbesimplymappedone-to-oneto thecorrespondingpartitionidastheyaregenerated.Although, thisrandomassignmentbalancestheworkloadtuplesacross thepartitionsitnotnecessarilyguaranteesminimuminter- serverdatamigrations.AsshowninFigure3,themapping matriceslabelledwith Random anddatabaselayoutswith GR- R , HGR-R and CHG-R aretherepresentativesofthisclass. 2)Max-Column(MC)Mapping: Weaimatminimisingthe physicaldatamigrationwithintherepartitioningprocessusing thisstrategy.Inthecluster-to-partitionmappingmatrixthe maximumtuplecountofanindividualcolumnisdiscovered, andtheentireclustercolumnismappedtotherepresented partitionidofthatmaximumcount.Thus,multipleclusters canbeassignedtoasinglepartition.Asmaximumnumbersof tuplesareoriginatedfromthisdesignatedpartitiontherefore theydonotmovefromtheirhomepartitionwhichreduces theoverallinter-serverphysicaldatamigrations.ForOLTP workloadswithskewedtupledistributionsanddynamicdata popularity,theimpactofDTscanrapidlydecreasefromthis greedyheuristicastuplesfrommultipleclustersmaymapto asinglepartitioninthesamephysicalserver.However,this directlyleadstodatavolumeimbalanceacrossthepartitions andservers.Mappingmatriceslabelledas MC withcorre- spondingdatabaselayoutsof GR-MC , HGR-MC ,and CHG- MC representthismappingstrategyinFigure3. 3)Max-Sub-Matrix(MSM)Mapping: Tobothminimise loadimbalanceanddatamigrations,weforkliftthenaturalad- vantagesofthepreviousstrategiesandcombinethemtogether. Atrst,thelargesttuplecountswithintheentiremapping matrixarefoundandplacedatthediagonallytopleftposition byperformingsuccessiverow-columnrearrangements.The nextphasebeginsbyomittingtheelementsintherstrow andcolumnthenrecursivelysearchtheremaining sub-matrix forelementwithmaximumtuplecounts.Finally,allthe diagonalpositionsofthematrixarelledupwithelements havingmaximumtuplecounts.Now,mappingtherespective clustersone-to-onetothecorrespondingpartitionsresultsboth minimumdatamigrationsanddistributionloadbalance.Note 218 TABLEIII.C OMPARISONOF S ERVERAND P ARTITION - LEVEL balance Method S balance P balance GR-R 1.1 1.4 GR-MC 1.3 1.6 GR-MSM 1.1 1.2 HGR-R 1.0 1.2 HGR-MC 1.1 1.4 HGR-MSM 1.1 1.2 CHG-R 1.1 1.6 CHG-MC 1.2 1.4 CHG-MSM 1.1 1.4 that,multiplemaximumtuplecountscanbefoundindifferent matrixpositions,andtherstsuchencounteredelementis chosenforsimplicity. The MSM strategyworkssimilarlytothe MC strategy asitprioritisesthemaximumtuplecountswithinthesub- matrices,andmaptheclustersone-to-onetothepartitions likethe Random mappingstrategythuspreventingpotential loadimbalanceacrossboththelogicalpartitionsandphysical servers.InFigure3,mappingmatriceslabelledas MSM , andrepresentativedatabaselayouts GR-MSM , HGR-MSM ,and CHG-MSM depictthismappingstrategy. F.TheBalanceMeasure Forillustrationpurpose,wereusethesame balance mea- sure[15]mentionedearlierwhileusingtheserverandpartition weightsinsteadofclusterweights.Considering GR-R database layoutasshowninFigure3,therearetotal20tuplesdistributed amongtwophysicalservers( S 1 ,S 2 )andfourlogicalpartitions ( P 1 ,...,P 4 ).Theseserverscontain11and9tupleswhilethe partitionscontain4,5,7,and4tuplesrespectivelywhichleads toa balance valueof (2 × 11) / (11+9)=1 . 1 attheserver- leveland (4 × 7) / (4+5+7+4)=1 . 4 atthepartition- level.TableIIIpresentsthecalculated balance measurefor theseentireninedatabaselayoutsinbothserverandpartition- levelwhere boldface valuesindicatelowest balance measure. Inoverall, GR-MSM and HGR-MSM performbetterthanall othersprimarilyintermsofminimumdatamigrationsandload balance.Fromthiselaborateillustration,itisclearthat k -way min-cutclusteringoftheworkloadnetworkacrosspartitions givesbetterestimationofloadbalance,andnerdegreesof freedomfordifferentcluster-to-partitionstrategiestominimise intra-andinter-serverphysicaldatamigrations. G.DistributedDataLookup AsmentionedinSectionIandIII,anycentralisedlookup mechanismisalwaysatrisktobethebottleneckinachieving high-availabilityandscalabilityrequirements.Wetakeaso- phisticatedapproachtodistributethedatatuplelookupprocess intoindividualdatabasepartitionlevel.Thus,datamigration operationsaretotallytransparenttodistributedtransaction processingandcoordination.Bymaintainingakey-valuelistof roaming and foreign dataidwiththeircorrespondingpartition id,individualpartitionscananswerthelookupqueries.Tuples areassignedpermanent home partitionidforitslifetime whenthedatabaseisinitiallypartitionedusingrange,hash, orconsistenthash[16]. Home partitionidonlychangeswhile apartitionsplitsormergesandtheseoperationsareoverseen bythe coordinators asshowninFigure1,thustransparentto thelookupprocess.Asthetuplelocationsaremanagedby their home partitions,datainconsistencyarestrictlyprevented. Unlessatupleisfullymigratedtoanotherpartition,andits roaming locationiswritteninthecatalogue,theoldpartition continueservingtransactionalprocessing. Whenatuplemigratestoanotherpartitionwithinthe processofincrementalrepartitioning,onlyitsrespective home partitionneedstobeawareofit.Thetarget roaming partition willtreatthismigratedtupleasa foreign andupdatesits lookuptableaccordinglywhereastheoriginal home partition willmarkthistupleas roaming initslookuptableandupdate itscurrentlocationwiththe roaming partitionsid.Alookup processalwaysquerythetuples home partitiontoretrieve it.Ifthetupleisnotinitiallyfoundinitsoriginallocation, thelookuptableentrythusimmediatelyinformsthemost recentlocationofthetupleandredirectthesearchtowardsthe roaming partition.Thus,amaximumoftwolookupoperations canberequiredtondatuplewithintheentiredatabase. Notethat,thecostofphysicaldatamigrationmayincrease whileusingsuchdistributedlookupprocess.Withahighprob- abilityindividualdatamigrationsintheincrementalrepartition processmayinvolverunninglocationupdateprocessupto threephysicalserversservingthe home partitionandtwo roaming partitionscurrentandtargetpartitions.Atpresent, weareinvestigatingtheimplicationofthiscost,andhowto includethisintheformulationofqualitymeasure. H.QualityMeasureforIncrementalRepartitioning Inevaluatingtheperformanceoftheincrementalreparti- tioning,previousworks[4],[5]onlymeasurethepercentageof reductioninDTs.However,thissinglemeasurefailstoimply anymeaningconclusionabouthowtheimpactofdistributed transactionisminimised.Further,therearenomeasuresfor overallloadbalanceanddatamigrations.Weproposethree independentmetricstomeasurethesuccessiverepartitioning qualityachievingthreedistinctobjectives 1)minimisethe impactofDTs;2)minimiseloadimbalance;and3)minimise thenumberofphysicaldatamigrations. Therstmetricmeasurestheimpactwithinascaleof0 to1associatingthefrequencyofDTsandtheirrelatedthe costofI/O.Thesecondmetricmeasuresthetuple-levelload distributionoverthesetofserversusing coefcientofvariation whicheffectivelyshowsthedispersionofdataloadoversuc- cessiveperiodofobservations.Thethirdmetricmeasuresthe meaninter-serverdatamigrationsforsuccessiverepartitioning processes.Bycombiningallthreeaforementionedmentioned metrics,acompositemetricisalsoproposedwhichrepresents themixofworkloadrepresentationandcluster-to-partition mappingstrategyforaparticularincrementalrepartitioning cycletoachieveacertainobjective.Inthefollowing,wemodel thesethreerepresentativemetricsindetail. 1)TheImpactofDistributedTransaction: Consideringthe formal denitionsprovidedinSectionIV-A,wecombinethe costofspanningmultiplephysicalserverbyanydistributed transaction t ,i , cost ( t ,i ) withthefrequencyof t ,i within W i , freq ( t ,i ) .Here, cost ( t ,i )= S t ,i = { v t : a | a b cd a,b,c = v } whichdenotesthenumberofphysical serversinvolvedinprocessing t ,i ,whereas, cost ( t ,i )=1 for 219 anynon-distributedtransaction t ,i .Notethat,inrealitythis costrepresentstheoverheadofI/Ooverthenetworkwhile processingtheDTs.Equation1belowdenesthespanning costof T within W i forall t ,i T cost ( T )= t ,i T cost ( t ,i ) freq ( t ,i ) (1) Similarly,(2)denotes cost ( T ) forall t ,i T cost ( T )= t ,i T freq ( t ,i ) (2) Finally,theactualimpactof T canbedenedas: I d = cost ( T ) cost ( T )+ cost ( T ) (3) 2)LoadBalance: Themeasureofloadbalanceacrossthe physicalserversisdeterminedfromthegrowthofthedata volumewiththesetofphysicalservers.Ifwecomputethe standarddeviationofdatavolume D S forallthephysical servers,then,thevariationofdistributionoftupleswithin theserverscanbeobserved.Thisisequivalenttowhatwe discussinSectionIV-Fasthe balance measure.Thecoefcient ofvariation( C v )denestheratiobetween D S and µ D S forall S underdeployment,andindependentoftheunitof measurement. C v cantellthevariabilityoftupledistribution withintheserversinrelationtothemeandatavolume µ D S . Equation4belowdeterminesthe C v oftheloadbalance measurefortheentireclusteratanyinstanceofobservation. L b = D S /µ D S (4) where µ D S = 1 n n i =1 D S i and D S = 1 n n i =1 ( D S i µ D S ) 2 3)Inter-ServerDataMigrations: Foranygiven W i ,the totalofinter-serverdatamigrationswithin i canbenormalised bydividingwiththemeandatavolume µ D S .Asshownin(5), D m measuresthequalityofinter-serverdatamigrationwith respecttothegivenworkload W i . D m = M v /µ D S (5) where M v isthetotalnumberofmigrationsduringthecurrent observationwindow. I.CompositeMetric( C m ) Let, C m bethecompositemetricwithweightfactors I d , L b ,and D m respectivelyfortheobjectivemeasures I d , L b , and D m where I d + L b + D m =1 providingtwodegrees offreedomtochoosebetweendifferentrepartitioninggoals. Besides, I d , L b ,and D m arefurthernormalisedby 0-1nor- malisation forunicationpurpose.Giventheapplicationand systemrequirements,systemadministratorscansetspecic goaltowardsachievingcertainqualityobjectives minimise I d , L b ,or D m fortheincrementalrepartitioningprocess.Based ondifferentweightdistributions,itisthuspossibletond arepartitioningsweetspotpreferringparticularchoicesof workloadnetworkrepresentationandcluster-to-partitionmap- pingstrategy.Thus,bynetuningthecombinationsinweight distributiononecaninstantlytackleunpredictablesituationsby tweakingthedirectionofincrementalrepartitioningprocessto maintainacceptableleveloftransactionalservices.Wedene the C m accordingtothefollowingequation: C m = I d I d + L b L b + D m D m (6) V.E XPERIMENTAL R ESULTS Tounderstandtheeffectivenessoftheproposedideas, webuiltaworkload-drivensimulationframeworkforthe distributeddatabasesystempresentedinFigure1.Weevaluate ourproposedmethodsagainstastaticpartitioningframework implementingthethreeworkloadnetworkrepresentations graph,hypergraph,andcompressedhypergraphusing random cluster-to-partitionmappingstrategy.Aworkload-awarestatic congurationredistributeworkloadtupleonlyonceanddoes notconsidersubsequentchangesintransactionalprole.This exhibitstheworst-casescenarioforanincrementalreparti- tioningframework,andweconsideritasthebaselineof comparison.Tuple-levelreplicationisnotuseinthesesettings asdiscussedinSectionIandIV. 12independentdatabasesarecomparedusing3work- loadnetworkrepresentationsincombinationof3cluster-to- partitionmappingstrategies.Morespecically,wecompare thedatabases GR-R , GR-MC , GR-MSM , HGR-R , HGR-MC , HGR-MSM , CHG-R , CHG-MC ,and CHG-MSM asdescribed throughSectionIV-BandIV-Eagainstthestaticpartitioning frameworkforallofthethreeindividual qualitymeasures . Ourgoalistoevaluatetheeffectivenessoftheproposed techniqueswithrespectto I d , L b ,and D m (asdetailedin SectionIV-H)forincrementalrepartitioningcycles.Hence,we donotcomparetheresultsagainstthe performancemeasures liketransactionalthroughputandlatency. A.ExperimentalSetup Weusearealisticexperimentalsetupofadistributed databaseasdepictedinFigure1,andusethepopularTPC- Ctransactionalworkloadsdevelopedinourworkload-driven simulator.AtypicalTPC-Cdatabasecontains9tables,5trans- actionsandsimulatesanorderprocessingtransactionalsystem withingeo-distributeddistrictsandassociatedwarehouses. AmongtheseStockandOrder-Linetablesareexceptionally fatinvolume,andthusallthelogicaldatabasepartitionsare nothomogeneousinsize.NewtuplesareinsertedintoOrder andOrder-LinetablesusingtheNew-OrderTransaction whichusuallyoccupiesnearly44.5%oftheworkload. Fixednumberoftransactionsaregeneratedunder5trans- actiontypesineachworkloadbatchhavingaxedbirthand deathrate.Wefurtherhashpartitionthese9databasetables bytheirprimaryids,andplacetheminto10datanodeservers havingatotalof90logicalpartitions.Notethat,itispossible tohashpartitionTPC-CtablesbyWarehouseidtobalance theworkloaddistribution,howeverweintentionallyavoidthis toexhibittheworstcasescenarioofDTsinapopularOLTP benchmark.Thevetypesoftransactionsareweightedfrom heavytolightintermsoftransactionalprocessing,andthey occurinhightolowfrequencies.Thesyntheticdatageneration processfollowsZipfsdistributionforgeneratingWarehouse andItemtables,andusethedatabaserelationshipconstraints togenerateothers.WeuseMetis[18]andhMetis[19] k - waymin-cutclusteringlibrarieswiththeirdefaultsettings. Theentiresimulationprocessrunsfor10timeshaving100 220 Impact of distributed transactions Serverlevel load balance Interserver data migrations 0.00 0.25 0.50 0.75 1.00 GR HGR CHG GR HGR CHG GR HGR CHG Normalised I d , L b , and D m Strategies Static Partitioning Random (R) MaxColumn (MC) MaxSubMatrix (MSM) Fig.4.Individualeffectsof I d , L b , D m inincrementalrepartitioningcomparingwiththestaticpartitioningschemeunderdifferentworkloadrepresentations. incrementalrepartitioningcyclesineachrunforallthe12 representativedatabasecongurations,andthenaveraged. B.ResultAnalysis 1)IndependentMeasureof I d , L b ,and D m : Figure4 presentsthecomparativeresultsoftheindependentrepartition- ingqualitymeasuresforthe12databasecongurationsunder test.Resultsareshownusingboxplotswithsmallredcircles denoting mean values.Individual qualitymeasures fordifferent settingsaregroupedtogetherintoseparateboxes.Withineach box,4differentstrategies 1) StaticPartitioning ,followedby3 IncrementalRepartitioningapproachesusing2) Random(R) , 3) Max-Column(MC) ,and4) Max-Sub-Matrix(MSM) cluster- to-partitionmappingstrategies(explainedinSectionIV-E)are compared.Weexpectthe StaticPartitioning schemetoexhibit theworst-casescenariosforalltheindividualmetricsasit performtheworkload-awaredataredistributiononlyonce,and donotrunfortherestoftheremaining99cycles.Asshown inFigure4,thevaluesof I d aretoohighforalltheworkload representations,loadbalancevarieswithinawiderangeof L b , andmeandatamigrationsarealmostzero. Inevaluating I d using(3),databaseswith MC based mappingstrategiesoutperformallothersduetoaggressive datamigrationsandmany-to-onecluster-to-partitionmapping. Fortheveryreason,datavolumedistributionsof GR-MC , HGR-MC ,and CHG-MC leadtocompleteimbalanceover the100repartitioningcyclesineachrun. MSM databases performssomehowsimilarcomparingtothe Random strategy implementations.Although graph and compressedhypergraph based GR-MSM and CHG-R performbetter,butinoverallboth HGR-R and HGR-MSM showgoodresultswithoutleaving feworanyoutliersatall.Whilecomparingtheresultsof L b (4),both Random and MSM baseddatabasesperform well.Although, HGR-MSM showsmuchstableresultsin comparisontoothers, GR-MSM winsoverallofthem.This supportsourintuitionthatrandomnessandone-to-onecluster- to-partitionmappingcannaturallybalancedatadistributions acrossthedatabasecluster.Inter-serverdatamigrations( D m using(5))arespecicallylowwiththe MC databases,however, GR-MSM and HGR-MSM bothperformsreliablyintermsof I d , L b ,and D m oversuccessiverepartitioningcyclesshowing theeffectivenessofourproposedtechniques.In CHG based conguration, CHG-MC performsbetterthan HGR-MC forall thequalitymeasures,however, HGR-MSM outperforms CHG- MSM inallaspects.Whilerankingthemeasuresofthese threeindividualmetricsaccordingly, GR-MSM winsoverall otherfollowingby CHG-MC and GR-MC ,while HGR-MSM achievesfourthplace.Bothofourproposedcluster-to-partition mappingstrategiesperformssignicantlybetterthanthe Ran- dom congurationintermsofminimisinginter-serverdata migrationsandloadimbalance.Inoverall,thebaseline Static Partitioning schemeisfoundineffectiveinhandlingdynamic OLTPworkloads,andjustiesourcomprehensivestudieswith IncrementalRepartitioning havingdifferentcombinationsof workloadrepresentationsandmappingstrategies. 2)CombinedEffectUsingCompositeMetric, C m : To understandthecombinedeffectof I d , L b ,and D m throughthe compositemetric C m using(6),weusedifferentcombinations oftherespectedweightfactorsprovidingthat I d + L b + D m =1 .Figure5showstheresultingmeasureof C m in a2-dperspectiveplotusingcolouredscalewhere L b and I d areplottedintheX-axisandY-axisrespectively.Thelocations presentingthevaluesof D m canbedeterminedbycalculating 1 ( I d + L b ) intheindividualsubplots.Wecansetspecic preferencestoprioritiseoneparticularrepartition qualitymea- sure overother.Individualextremesof I d , L b ,and D m can befoundat(1,0),(0,1),and(0,0)locations.Byfollowing thecolourcodesfromthelegend,onecaneasilyidentifyhow individualrepartitioningobjectiveswouldbemet.Fromthe plots,asanticipatedinSectionIV, MC baseddatabasesdo notfavour L b whiledramaticallyreducing I d incontrary.We canalsoidentifytherepartitioningchoicesforgeneral-purpose OLTPapplicationas GR-MSM and GR-R followedby CHG- MSM and CHG-R ,whileallofthe HGR basedsettingsare highlytunabledependingontherepartitioningobjectivesin responsetodifferentadministrativesituations.Akeyobserva- tionhereisthat,thechoicesofworkloadrepresentationand mappingstrategyarenotboundedtoanyspeciccombination. Toconrmthis,wealsoconducttwo-wayANOVAtestand 221 GRR GRMC GRMSM HGRR HGRMC HGRMSM CHGR CHGMC CHGMSM 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.000.250.500.751.00 0.000.250.500.751.00 0.000.250.500.751.00 Normalised L b Normalised I d 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 C m Fig.5.Combinedeffectof I d , L b ,and D m throughcompositemetric C m . Notethat,lowervaluesof C m indicatebettersolutions. analysetheinteractionplots.However,wedidnotndanytrue evidenceofinteractionsbetweenthechoicesofrepresentation andmappingstrategy.ResultsfromANOVAtablealsosupport thisnding.Theseseriesofobservationsstronglysupportour argumentspresentedinSectionIIIandIV,andjustiesthe goalofsensitivityanalysiswithinabroaddesignspace,which, tobestofourknowledgewasnotdonebefore. VI.C ONCLUSIONAND F UTURE W ORKS Inthispaper,wepresentaworkload-awareincremental repartitioningframeworkforOLTPdatabaseswhichminimises 1)theimpactofDTsusingthe k -waybalancedmin-cutclus- tering;2)theoverallloadimbalancethroughtherandomness oftheone-to-onecluster-to-partitionmappingstrategies;and 3)thephysicaldatamigrationsbyapplyingheuristics.Our innovative transactionclassication techniqueensuresglobal minimisationinoverallloadimbalanceanddatamigrations comparingtotheworst-casescenarioofaStaticPartitioning frameworkimplementingrandomcluster-to-partitionmapping fordifferentworkloadrepresentations.The elaboratemod- elling approachclearlyidentiestheinter-relatedgoalswithin therepartitioningprocess,andprovideseffectiveheuristicsto achievethembasedonoperationalrequirements.Byadopt- ingtheconceptof roaming ,theproposeddistributeddata lookuptechniquetransparentlydecentraliselookupoperations fromthedistributedtransactioncoordinatorguaranteeinghigh- scalability.Ourphilosophicalargumentsbroadenthedecision spacewithcomprehensive sensitivityanalysis bycombining differentworkloadrepresentationsandmappingstrategies.The proposedsetof qualitymetrics presentsasophisticatedway tomeasurethequalityofsuccessiverepartitioning,andour simulationresultsoutperformtheStaticPartitioningstrategies inachievingindividualrepartitioningobjectives.Theuseof compositemetric showsaneffectivewayofoperationalintelli- genceforCloudapplicationssufferingfromdynamicworkload behaviours.Atpresent,weareinvestigatingthefollowings asourfuturedirection incrementalrepartitioningenabling replicationinasinglestep,enforcedifferent balance criteria inthecluster-to-partitionmappingheuristics,andincludethe costofdatamigrationsinmodellingthe quality measures. R EFERENCES [1]IBMDataHub,www.ibmbigdatahub.com/infographic/four-vs-big- data,[Online]. [2]L.Gu,D.Zeng,P.Li,andS.Guo,Costminimizationforbig dataprocessingingeo-distributeddatacenters, IEEETransactionson EmergingTopicsinComputing ,vol.99,no.PrePrints,2014. [3]R.Johnson,I.Pandis,andA.Ailamaki,Eliminatingunscalablecom- municationintransactionprocessing, TheVLDBJournal ,vol.23,no.1, pp.1 23,Feb.2014. [4]C.Curino,E.Jones,Y.Zhang,andS.Madden,Schism:aworkload- drivenapproachtodatabasereplicationandpartitioning, Proceedings oftheVLDBEndowment ,vol.3,no.1-2,pp.48 57,Sep.2010. [5]A.Quamar,K.A.Kumar,andA.Deshpande,SWORD:scalable workload-awaredataplacementfortransactionalworkloads,in Pro- ceedingsofthe16thInternationalConferenceonExtendingDatabase Technology ,ser.EDBT13.NY,USA:ACM,2013,pp.430 441. [6]Vitess scalingMySQLdatabasesforlargescalewebservices, https://github.com/youtube/vitess,[Online]. [7]MySQLtoolkitformanagingbillionsofrowsandhundredsofdatabase machines,https://github.com/tumblr/jetpants,[Online]. [8]Aexibleshardingframeworkforcreatingeventually-consistentdis- tributeddatastores,https://github.com/twitter/gizzard/,[Online]. [9]M.MehtaandD.J.DeWitt,Dataplacementinshared-nothingparallel databasesystems, TheVLDBJournal ,vol.6,no.1,pp.53 72,Feb. 1997. [10]A.Pavlo,C.Curino,andS.Zdonik,Skew-awareautomaticdatabase partitioninginshared-nothing,paralleloltpsystems,in Proceedingsof the2012ACMSIGMODInternationalConferenceonManagementof Data ,ser.SIGMOD12.NY,USA:ACM,2012,pp.61 72. [11]T.Raq,Elasca:Workload-awareelasticscalabilityfor partitionbaseddatabasesystems,Mastersthesis,University ofWaterloo,Canada,May2013.[Online].Available: http://uwspace.uwaterloo.ca/handle/10012/7525 [12]B.P.Swift,Dataplacementinascalabletrans- actionaldatastore,Mastersthesis,VrijeUniversiteit, Amsterdam,Netherland,Feb.2012.[Online].Available: http://www.globule.org/publi/DPSTDS master2012.pdf [13]RoaminginGSMNetwork,http://en.wikipedia.org/wiki/Roaming, [Online]. [14]J.M.Pujol,V.Erramilli,G.Siganos,X.Yang,N.Laoutaris,P.Chhabra, andP.Rodriguez,Thelittleengine(s)thatcould:Scalingonlinesocial networks, IEEE/ACMTransactionsonNetworking(TON) ,vol.20, no.4,pp.1162 1175,Aug.2012. [15]A.Turk,R.O.Selvitopi,H.Ferhatosmanoglu,andC.Aykanat, Temporalworkload-awarereplicatedpartitioningforsocialnetworks, IEEETransactionsonKnowledgeandDataEngineering ,vol.99,no. PrePrints,2014. [16]D.Karger,E.Lehman,T.Leighton,R.Panigrahy,M.Levine,and D.Lewin,Consistenthashingandrandomtrees:distributedcaching protocolsforrelievinghotspotsontheworldwideweb,in Proceedings ofthetwenty-ninthannualACMsymposiumonTheoryofcomputing , ser.STOC97.NY,USA:ACM,1997,pp.654 663. [17]MobileIP,http://en.wikipedia.org/wiki/Mobile IP,[Online]. [18]G.KarypisandV.Kumar,Multilevelk-waypartitioningscheme forirregulargraphs, JournalofParallelandDistributedComputing , vol.48,no.1,pp.96 129,Jan.1998. [19],Multilevelk-wayhypergraphpartitioning,in Proceedingsofthe 36thAnnualACM/IEEEDesignAutomationConference ,ser.DAC99. NY,USA:ACM,1999,pp.343 348. 222