/
Workload-AwareIncrementalRepartitioningof Workload-AwareIncrementalRepartitioningof

Workload-AwareIncrementalRepartitioningof - PDF document

briana-ranney
briana-ranney . @briana-ranney
Follow
385 views
Uploaded On 2017-11-24

Workload-AwareIncrementalRepartitioningof - PPT Presentation

SharedNothingDistributedDatabasesfor ScalableCloudApplications JoarderMohammadMustafaKamal FacultyofInformationTechnology MonashUniversity VictoriaAustralia Emailjoarderkamalmonashedu ManzurMurs ID: 608667

Shared-NothingDistributedDatabasesfor ScalableCloudApplications JoarderMohammadMustafaKamal FacultyofInformationTechnology MonashUniversity Victoria Australia Email:joarder.kamal@monash.edu ManzurMurs

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Workload-AwareIncrementalRepartitioningo..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Workload-AwareIncrementalRepartitioningof Shared-NothingDistributedDatabasesfor ScalableCloudApplications JoarderMohammadMustafaKamal FacultyofInformationTechnology MonashUniversity Victoria,Australia Email:joarder.kamal@monash.edu ManzurMurshed FacultyofScienceandTechnology FederationUniversity Victoria,Australia Email:manzur.murshed@federation.edu.au RajkumarBuyya Dept.ofComp.andInformationSystems UniversityofMelbourne Victoria,Australia Email:rbuyya@unimelb.edu.au Abstract „Cloudapplicationsoftenrelyonshared-nothing distributeddatabasesthatcansustainrapidgrowthindata frommultiplegeo-distributedserverscanadverselyimpactthe performanceofsuchdatabases,especiallywhenthetransactions areshort-livedinandrequireimmediateresponse.The k -way min-cutgraphclusteringalgorithmhasbeenfoundeffective toreducethenumberofDTswithacceptablelevelofload balancing.Bene“tsofsuchastaticpartitioningscheme,however, isshort-livedinCloudapplicationswithdynamicallyvarying workloadpatternswhereDTpro“lechangesovertime.Thispa- peraddressesthisemergingchallengebyintroducingincremental repartitioning.Ineachrepartitioningcycle,DTpro“leislearnt onlineand k specialsub-graphrepresentingallDTsaswellasthosenon-DTs thathaveatleastonetupleinaDT.Thelatterensuresthatthe min-cutalgorithmminimallyreintroducesnewDTsfromthenon- DTswhilemaximallytransformingexistingDTsintonon-DTsin thenewpartitioning.Potentialloadimbalanceriskismitigated byapplyingthegraphclusteringalgorithmonthe“nerlogical partitionsinsteadoftheserversandrelyingonrandomone-to-one cluster-to-partitionmappingthatnaturallybalancesoutloads. Inter-serverdata-migrationduetorepartitioningiskeptincheck withtwospecialmappingsfavouringthecurrentpartitionof majoritytuplesinacluster„themany-to-oneversionminimising migrationwithoutaffectingloadbalancing.Adistributeddata lookupprocess,inspiredbytheroamingprotocolinmobilenet- works,isintroducedtoef“cientlyhandledatamigrationwithout affectingscalability.Theeffectivenessoftheproposedframework isevaluatedonrealisticTPC-Cworkloadscomprehensivelyusing graph,hypergraph,andcompressedhypergraphrepresentations usedintheliterature.Simulationresultsconvincinglysupport incrementalrepartitioningagainststaticpartitioning. Keywords „ Clouddatabases;workload;distributedtransac- tions;incrementalrepartitioning;load-balance;datamigration; I.I NTRODUCTION Nowadays,electronicdataarebeinggeneratedinanun- ine-commerce,onlinebusinessprocessing,digitalmedia,and socialnetworks.Itisestimatedthat2.3trilliongigabytesof digitiseddataaregeneratedeverydayaroundtheglobe[1]. Asanexample,inanaverageday,over30billionpiecesof contentsaresharedinFacebookwhile4billionhoursofvideos arewatchedinYouTube[1].Inrecentyears,suchInternet- scaleWebapplicationsscale-outinstantaneouslyusingCloud computingtechnologies.Shared-nothingdistributeddatabases incombinationwithhorizontaldatapartitioning,providea keymechanismtohandlethismassivedataexplosionand toscaletobillionsofconcurrentusers.Unfortunately,tra- ditionalapproachescanhardlyadoptthedynamicworkload balanceoperationswithinageo-distributedcluster[2].With thedynamicnatureofuser-facinginteractiveWebapplications drivingOnlineTransactionProcessing(OLTP)workloads,its simplynotpossibleforastaticpartitioningandplacement modeltoworkeffectivelybyonlyaddingmoreserversand harddiskstothecluster.Bynature,OLTPtransactionsare small-sizedandshort-livedwithanimmediateresponsetime requirement.Atthesametime,withinapartitioneddatabase, DTsoccurfrequentlyandspanacrossmultipleserverscreating unscalablecommunicationsintransactionprocessing[3].In addition,toadoptdynamicworkloads,large-scaledatamigra- tionsarerequiredinvolvingsigni“cantcostintermsofI/O, databaseresources,andpotentialdowntime. partitioning[4],[5]monitorthetransactionallogsandperiod- icallycreateworkloadnetworksusinggraphorhypergraph representation.Eachedgeinaworkloadgraphconnectsa pairoftuplesoriginatedfromthesametransactionwhereas ahyperedgeconnectsalltupleswithinatransactionina hypergraph.Later,theseworkloadrepresentationsareclustered using k -waymin-cutclustering,andthenrandomlyplaced acrossthesetofphysicalserverswithinadatabasecluster.As longasworkloadcharacteristicsdonotchangedramatically, andtuplesfromaclusterstaytogetherinaphysicalserver, theoccurrencesandadverseimpactsofDTsarereduced rapidly.Anumberofcentraliseddatalookupandrouting redistribution.Large-scaleOLTPserviceprovidersdevelop partitionmanagementsolutionslikeYouTubesVitess[6], TumblrsJetPants[7],andTwittersGizzard[8]todealwith rapiddatagrowth.Nonetheless,theunderlyingdataplacements arenottransparenttoapplicationcodes,andredistributionsare notawareofworkloaddynamics.Furthermore,noneofthese techniquesprovideanyexplicitwaytominimisephysicaldata migrationsoverWAN,andgloballoadbalanceatthesame 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing978-1-4799-7881-6/14 $31.00 © 2014 IEEEDOI 213 timeforageo-distributedshared-nothingdatabasecluster. Inthispaper,wepresentaproactiveworkload-awarein- crementalrepartitioningframeworkwhichtransparentlyredis- tributesdatabasetuplestoensureminimumdatamigrations andgloballoadbalance.Transactionallogsarecollectedperi- odicallythenundergoapre-processingandclassi“cationstage beforegeneratingworkloadnetworksformin-cutclustering. Auniquetransactionclassi“cationprocessisintroducedto identifypurelydistributedtransactionsandnon-distributed onescontaining moveable datatuplesthatarealsocontained inaDT.Thisnovelclassi“cationremovestheshortcomings ofselectiveswappingoftuplesetsforlocalloadbalancing byextendingthesizeofworkloadnetwork,andoverthetime reducestheimpactofDTsensuringglobaloptimisationinboth loadbalanceanddatamigrations.Wealsoperformadetail sensitivityanalysisbyrepresentingtheworkloadnetworksin “ne,exact,andcoarsegranularityusinggraphs,hypergraphs, andcompressedhypergraphs.Incontrarytopreviousworks, a“xednumberofclustersarecreatedfromtheworkload networkforthetotalnumberoflogicaldatapartitionsinthe entiredatabaseinsteadofthenumberofphysicalservers.This provides“nercontrolinloadbalanceoverthesetofboth partitionsandservers.Wealsoavoidtuple-levelreplications toobservethequalityofincrementalrepartitioningunder worst-casescenarioofDTs.Wealsoproposetwoinnovative cluster-to-partitionmappingstrategiesthatcaterforminimising bothphysicaldatamigrationsanddistributionimbalance.Our distributeddatalookupmechanismensureshigh-scalability, andguaranteesamaximumoftwolookupstolocateatuple withinthepartitioneddatabase. Toevaluatethequalityofincrementalrepartitioning,we deviseasetofmetricsandalsoprovideawaytoadmin- istrativelydirectaparticularrepartitioningobjectiveusinga compositemetric.Finally,wecomparethequalityofthe proposedincrementalrepartitioningframeworkagainstastatic partitioningcon“gurationsimilarto[4]implementingrandom one-to-onecluster-to-partitionmappingwithdifferentwork- loadrepresentations.Morespeci“cally,wecompare12dif- ferentdatabasecon“gurationswithdifferentsettings…3work- loadrepresentationsand3mappingstrategiesforstaticand incrementalrepartitioning.Oursimulationbasedexperimental resultsusingrealisticTPC-Cworkloadsignifythetrade-offs betweendifferentrepartitioningapproacheswhileshowing theclearshortcomingsofastaticpartitioninginachieving dynamicdataredistributionsforOLTPdatabases. Themaincontributionsaresummarisedinbelow: € Investigatingpossibledesignchoicesforworkload networkrepresentationsandtheirapplicability. € Proposingaproactivetransactionclassi“cationtech- niquethatidenti“esDTsandmoveablenon-DTsto createworkloadnetworks. € Presentingtwocluster-to-partitionmappingstrategies thatensureminimuminter-serverdatamigrationsand loadimbalanceacrosspartitionsandservers. € Developingascalabledistributeddatalookuptech- niquethatrequiresamaximumoftwoI/Oroundtrips tolocateadatatuplewithintheentiredatabase. € Devisingasetofqualitymetricsfortheincremental repartitioningprocessde“ningdifferentobjectives. Theremainderofthispaperisorganisedasfollows:we reviewtherelatedworksinbriefinSectionII;ahigh-level overviewoftheproposedframeworkisdiscussedinSection III;SectionIVdetailsthesteps,formulations,anddesign philosophieswithnecessaryillustrations;SectionVdiscusses theexperimentalresultscomparingtoastaticpartitioning framework;and“nallySectionVIconcludesthepaper. II.R ELATED W ORK Workload-awareloadbalancewithI/Ooverheadminimi- sationindistributeddatabasesystemswasstudiedbefore for“ndingoptimaldataplacementstrategyinshared-nothing paralleldatabases[9].RecentworksprimarilyfocusonOLTP workloadsforscaling-outtheCloudapplicationstominimise thenumberofDTs.Workload-awaredatareplicationand partitioningapproachis“rstintroducedby[4]forOLTP databases.TheauthorsproposedSchismwhichrepresents thetransactionalworkloadasagraph,andperforms k -way replicatedgraphpartitioningtominimisetheeffectofDTs. However,Schismusuallygeneratesverylargegraphs,does notdealwithdynamicworkloadchanges,andthemoregen- eralproblemofrepartitioning.Transactionalworkloadsare modeledascompressedhypergraphin[5]byhashingdata tuplesprimarykeytoreducetheoverheadof k -wayclustering. TheauthorsproposeSWORD,anincrementalrepartitioning techniquewhichmovesa“xedamountofdatainaregular intervaluponnotifyingworkloadchanges,andbyobserving theincreaseinthepercentageofDTsfromaprede“ned threshold.However,thisreactiveapproachonlyensureslocal loadbalance,anddoesnotalwaysguaranteesreductioninDTs. Duetotheselectiveswappingoftherandomlycompressed tuplesetsandnewlytransformedDTs,thequalityofmin- cutclusteringmaylost,andgraduallyleadtoglobaldata distributionimbalance.In[10],anotherautomaticworkload- awaredatabasepartitioningmethodisproposedalongwithan analyticalmodeltoestimateskewandcoordinationcostfor DTs.Itusesthesamegraphbasedworkloadrepresentation of[4],andprimarilyfocusesonoptimaldatabasedesign basedonworkloadcharacteristics.However,itdidnotconsider incrementalrepartitioning. Elascaisproposedin[11],whereamulti-object workload-awareonlineoptimiserisdevelopedforoptimalpar- titionplacementensuringminimumdatamovement,however itdoesnotsupportincrementalrepartitioning.Adistributed lookupmethodfortransactionaldatabasesrequiringspecial knowledgenodesforcoordinationisproposedin[12],how- everitmayperformincorrectroutingduetoinconsistent values.Incontrast,ourproposeddistributedlookupoperation isbasedonthewellknownconceptof roaming [13],and italwaysguaranteesconsistentresultswithamaximumof twolookups.In[14],aSocialPartitioningandReplication middleware…(SPAR)isproposedthatexploresthesocialnet- workgraphfromuserinteraction,andthenperformsjoint partitioningandreplicationtoensurelocaldatasemanticsfor theusers.Similarly,in[15],temporalactivityhypergraphsare usedtomodeluserinteractionsinsocialnetwork,andthen min-cutclusteringisusedtominimisetheimpactofDTswith minimumloadimbalance.However,noneofthesetechniques exploretheincrementalrepartitioningproblem,andtheeffect ofdatamigrationsingloballoadbalance. 214 C 1 C 2 C 3 C 4 P 1 1 1 2 0 P 2 1 0 0 1 P 3 0 0 1 0 P 4 3 0 0 0 (3c) Classifying Transaction Generate workload networks (3b) Parsing and explaining SQL queries (3a) Process transactional logs 1 71 23 9 1 2 43 12 33 47 1 3 71 9 1 ƒ. N 43 23 1 67 3 GTID- 1 { select ƒ, insert ƒ, delete ƒ, update ...}; GTID- 2 ; GTID- 3 ; ƒƒ ; GTID- N Extract primary keys from transactions (3e) Perform k -way balanced clustering (3f) Mapping Clusters-to-Partitions Heuristics Constraints P 1  d 9, d 33 , d 23 , d 71 P 2  d 1 P 3   P 4  d 3 , d 12 , d 43 , d 47 , d 67 Cluster-to-Partition mapping matrix with da ta tuple counts (3g) Generate data tuple migration plans Migration plan Cluster 1 Cluster 4 71 67 1 43 12 33 9 3 47 23 Cluster 2 Cluster 3 Database and System Load Statistics Data Node Data Node Data Node Data Node Logical Data Partition Data Migration Plan Cloud App. Data Tuple Migrator Workload Analyser (6) Data Migration Distributed Transaction Manager ODBC Driver Partition Management Location Catalogue Load Balancer Coordinator Nodes Transaction Streams Analyser Node (1) (2) (3) (5) Client (GTID = Global Transaction ID) Workload-Aware Incremental Repartitioning Roaming Data Tuples in a Foreign Partition Data Tuple in a Home Partition (3d) Representing workloads Graph Hypergraph Minimum edge-cut Identify movable distributed transactions Workload Specific Location Cache (4) Feed data tuples partition and node location ....... Location Update ODBC Driver ODBC Driver ODBC Driver ODBC Manager Cloud Region A Cloud Region B Fig.1.Anoverviewoftheworkload-awareincrementalrepartitioningframeworkusingnumberednotationsfrom1-6representingtheoverallwork”ow.Steps 3ato3grepresentthe”owofworkloadanalysis,representation,clustering,andrepartitioningdecisiongeneration. III.P ROPOSED S YSTEM O VERVIEW Anoverviewoftheproposedframeworkisshownin Figure1.Weassumeasetof coordinator nodesservingclients requests,andmanagetheexecutionsoftransactionalqueries. Coordinatorsareconnectedwithasetofgeo-distributed data nodeswherethelogicalpartitionsreside.Eachlogicalpartition containsalocationcataloguewheretheresidingtuplesloca- tionsandtheircurrentpartitionidsarepersistedaskey-value pairs.Notethat,individualdatanodescanbesynchronously replicatedasmaster-slavewithinindependentgroupstoensure highavailabilitywhichisacommondeploymentpractice. Thus,inthisworkwedonotexplicitlyhandletuplelevel replicationlike[4].Coordinatorsalsoadministerpartition managementoperations(likesplit,merge,andmigration)and incomingread/writeworkloadbalance.Streamsoftransac- tionallogsarecontinuouslypulledbythe analyser node,and pre-processforanalysiseitherinatimeorworkloadsensitive window.Analysernodecanalsocachethemostfrequently appearedtuplelocationinaworkload-speci“ccataloguewhich iskeptupdateduponinter-partitiondatamigrations.Following Figure1,theinputoftheworkload-awareincrementalreparti- tioningcomponent(indottedrectangle)istransactionallogs, andtheoutputisapartition-leveldatamigrationplan.The overallprocesshasfourprimarysteps: Pre-processing,Parsing,andClassi“cation. Clientap- plicationssubmittransactionalqueriesinstep1,whichis thenprocessedbya distributedtransactioncoordinator that managesthedistributeddatanodes.Uponpullingthestreams oftransactionalworkloadsinstep2,individualtransactionsare processedtoextractthecontainedSQLstatementsatstep3a. ForeachSQLstatement,theprimarykeysofindividualtuples areextracted,andcorrespondingpartitionidsareretrieved fromtheembeddedworkloadspeci“clocationcataloguein step3b.Intheclassi“cationprocess(3c),originalDTand moveablenon-DTsareidenti“edalongwiththeirfrequency countsinthecurrentworkload,andtheirassociatedcostsof spanningmultipleservers. WorkloadRepresentationand k -wayClustering. In step3d,workloadnetworksaregeneratedfromtheextracted transactionallogsgatheredinthepreviousstepusinggraph orhypergraph.Tuple-levelcompressioncanfurtherreducethe sizeofworkloadnetwork.Sincetransactionalgraphscannot fullyrepresenttransactionswithmorethantwotuplesusing pair-wiserelationship,wecannotdirectlyminimisetheimpact ofDTsintheworkload.However,graphrepresentationsare muchsimplertoproduce,anditadoptedwiderangesof applicationspeci“cusagesthatalsohelpustounderstand itsimportanceincreatingworkloadnetworks.Ontheother hand,hypergraphscanexploitexacttransactionalrelationships, thusthenumberofhyperedgecutsexactlymatchesthe numberofDTs.Yet,popularhypergraphclusteringlibraries arecomputationallyslowerthanthegraphclusteringlibraries, andproducelesseffectiveresults[4]. Inreality,withtheincreaseinsizeandcomplexity,both oftheserepresentationsarecomputationintensiveinma- nipulation.Furthermore,compressiontechniquescancon“ne analgorithmwithinaspeci“edtarget,dramaticdegradation inclusteringqualityandoverallloadbalanceoccurwitha highcompressionratio[5].Finally,workloadnetworksare clusteredusing k min-cutclusteringemployedbythegraph andhypergraphclusteringlibrariesinstep3e. Cluster-to-PartitionMapping. Atstep3f,amapping matrixiscreatedwiththecountsfortuplesthatareplaced inthenewlycreatedclusterandoriginatedfromthesame partitionasthematrixelement.Theproducedclustersfromthe 215 min-cutclusteringarethenmappedtotheexistingsetoflogical partitionsbyfollowingthreedistinctstrategies.At“rst,we employuniformrandomtupledistributionformappingclusters todatabasepartitionswhichnaturallybalancesthedistribution oftuplesoverthepartitions.However,thereisnoproactive considerationinthisrandomstrategyforminimisingdata migrations.Thesecondstrategyemploysastraightforward butoptimalapproach.Itmapsaclustertoarespectivepartition whichoriginallycontainsmaximumnumberoftuplesfromthat cluster,henceminimumphysicaldatamigrationstakeplace. Inmanycases,thissimplestrategyturnsouttobemany- to-onecluster-to-partitionmapping,anddivergesuniformtuple distribution.Again,incrementalrepartitioningcancreateserver hot-spotassimilartransactionsfromnewworkloadbatches willalwaysdrivemorenewtuplestomigrateintoahotserver. Asaconsequence,overallloadbalancedecreasesovertime, whichisalsoobservedinourexperimentalresults.Away torecoverfromthissituationisbyensuringthatcluster-to- partitionmappingremainsone-to-one,whichisusedasthe thirdstrategy.Thissimple,yeteffective,schemerestoresthe originaluniformrandomtupledistributionwiththeconstraint ofminimisingdatamigrations.Finally,instep3g,based ondifferentmappingstrategiesandappliedheuristicsadata migrationplanisgenerated,andthenforwardedtothedata tuplemigratormoduleinstep5. DistributedLocationUpdateandRouting. Theanalyser nodekeepsaworkloadspeci“clocationcatalogueforthe mostfrequentlyaccessedtuples,andupdatestheassociated locationsateachrepartitioningcycleinstep4.Theanalyser alsodirectlyinvokesthecorrespondingdatanodestoperform datamigrationsinstep6withoutinterruptingtheongoing transactionalservices.Untilatuplefullymigratestoanew partition,itsexistingpartitionservesallthequeryrequests. Distributeddatabasesusingrangepartitioningrequirekeeping acentrallookuptablefortheclientstoretrievetuples.Hash partitioningrequirestheclienttousea“xedhashfunctionto lookuptherequiredtuplesinthespeci“edserver.Consistent hashpartitioning[16]employsdistributedlookupmechanism usingdistributedhashtable.However,noneoftheseparti- tioningschemesprovidescalabledatalookupmechanismsfor successivedataredistribution. Tosolvethisproblem,weusethewellestablishedconcept of roaming fromwirelesstelecommunicationsandcomputer datanetworks.Theproblemoflocationindependentrouting isalreadysolvedinIPv6usingMobileIP[17],andinGSM networksusingroamingmobilestations[13].Inasimilarway, theattachedlocationcataloguewithineachdatapartitionkeeps trackofthe roaming tuplesandtheircorresponding foreign partitions.Amaximumoftwolookupsarerequiredto“nda tuplewithoutclient-sidecaching.Withpropercachingenabled, thislookupcostcanbeevenamortisedtooneformostofthe caseswithhighcashhit. IV.W ORKLOAD -A WARE I NCREMENTAL R EPARTITIONING A.ProblemFormulation Let, S = { S 1 ,...,S n } bethesetof n shared-nothing physicaldatabaseserverswhereeach S i = {P i, 1 ,..., P i,m } denotesthesetof m logicalpartitionsresidein S i .Again,let, P i,j = { d i,j, 1 ,...,d i,j, | P i,j | } denotesthesetoftuplesreside TABLEI.S AMPLE D ATABASE :P HYSICALAND L OGICAL L AYOUT Servers Partitions S 1 (10) P 1 :(5)= { 2,4,6,8,10 } P 3 :(5)= { 12,14,16,18,20 } S 2 (10) P 2 :(5)= { 1,3,5,7,9 } P 4 :(5)= { 11,13,15,17,19 } in P i,j .Wecanthusgettheamountoftuplesresidein S i as D S i =   j P i,j .Finally, D S =   i DS i denotesthetotal amountoftuplesintheentirepartitioneddatabase. Let, W = {W 1 ,..., W  } bethesetofworkloadbatches withinthedatabaselifetime  .Each W i representsaparticular workloadbatchatthe i thtickof  where  =   i  i .Theset oftransactionsinany W i isrepresentedby T = { t 1 ,...,t z } , andcanbeeithercharacterisedasdistributed( T  )ornon- distributed( T  ),thus T = T   T  and T   T  =  where T  = { t , 1 ,...,t , | T  | } and T  = { t , 1 ,...,t , | T  | } .Again,any distributedornon-distributedtransaction t ,i or t ,i canoccur multipletimewithin W i ,hence,itsfrequencycanberepre- sentedbyeither freq ( t ,i ) or freq ( t ,i ) .Asany t ,i canspan multipleservers,wede“nethecostofspanningas cost ( t ,i ) . Weconsiderthecostofspanningmultiplepartitionsbya transactionwithinaservernegligibleintermsI/Ooverhead. Letsde“netheproblemofincrementalrepartitioningas: ProblemDe“nition :Foragiventransactionalworkload W i at i thobservation, S homogeneousserverscontainingtotal P logicalpartitions,andamaximumallowedimbalanceratio  , “ndanincrementalrepartitioningsolution X i fromtheoutput ofa k -waybalancedclustering  whichminimisesthemean impactofDTsin W i andimbalancein D S acrosspartitions andservershavingminimuminter-serverdatamigrations. Inthefollowing,weuseillustrativeexamplesusinga simpledatabaseconstructionwith20datatuplesdistributed usinghash-partitioningover4logicalpartitionsand2physical serversasshowninTableI.Asampleworkloadbatchwith 7transactionsandcorrespondingdatatuplesarealsoshown inTableII.Finally,adetailillustrationonhowthecluster- to-partitionmappingstrategiesworkwithdifferentworkload representationsisshowninFigure3. B.WorkloadModelling Wemodeltheworkloadnetworksusingthreedistinctrep- resentations.Firstly,graphrepresentation( GR )produces“ne- grainworkloadnetworkalthoughitisunabletofullycapture theactualtransactionalrelationshipbetweendifferenttuples. Yet,graph min-cut processcanstillgeneratehighquality k - wayclusteringandminimisestheimpactofDTs,unlessthe overallgraphsizeincreaseswithworkloadvariability,and adequatelevelofsamplingisperformed[4].Secondly,hyper- graphrepresentation( HGR )generatesmostaccurate,andexact workloadnetworksthusalsoabletoproducebalancedclus- terswithmin-cuthypergraphclustering.Moreover,fromour empiricalstudieswefoundthat, k -waymin-cutbalancedhy- pergraphclusteringproducesmoreconsistentresultsinterms ofachievingtherepartitioninggoals,andisalsomentioned in[15].Finally,compressedhypergraphrepresentation( CHG ) producescoarse-grainworkloadnetworksdependingonthe compresslevel.Withlowerlevelofcompression,lesscoarse 216 At least one data tuple resides in a distributed transaction Transactions Non-distributed Distributed Non- moveable Moveable Otherwise At least two data tuples in different servers Otherwise Fig.2.Transactionclassi“cationidentifyingDTsandmoveablenon-DTs. networksaregeneratedand k -wayclusteringperformsbetter. However,asshownin[5],asthelevelofcompressionincreases thequalityoftheclusteringprocessdegradesdramatically.We formallyde“netheindividualrepresentationsasinbelow: 1)GraphRepresentation: Agraph G =( V ,E g ) represents W i whereeachedge e g  E g linksapairoftuples ( v x ,v y ) from V = { v 1 ,...,v |V| }D S foratransaction t i where v i =  a  b  cd a,b,c .Individualtuplesfrom ( v x ,v y ) connectstotheir respectivesetofadjacenttuples A v x and A v y originatedfrom thesame t i .Anyedgewithin t i hasaweightrepresentingthe frequencyof t i in W i whichco-accessthepair ( v x ,v y ) ,while vertexweightrepresentsthetuplessize(involume). 2)Hypergraphrepresentation: Ahypergraph, H = ( V ,E h ) represents W i whereahyperedge e h  E h charac- terisesatransaction t i andoverlaysitscontainedsetoftuples V t i V .Ahyperedgerepresenting t i isassociatedwitha weightdenotingthefrequencyof e h within W i anditsvertices weightrepresentdatatuplessize(involume). 3)CompressedHypergraphrepresentation: Ahypergraph, H =( V ,E h ) canbecompressedbycollapsingthever- ticestoasetofvirtualvertices V  usingasimplehash functionontheprimarykeys[5].Acompressedhypergraph H c =( V  ,E  h ) represents W i whereeachvirtualhyperedge e  h  E  h constitutesthesetofvirtualvertices v  e h V  where theoriginalverticesof e h aremappedintoand | v  e h | 2 . Virtualvertexweightrepresentsthecombineddatavolume sizesofthecorrespondingcompressedtuples.Andhyperedge weightrepresentsthefrequencyoftransactionswhichaccess thecorrespondingvirtualvertices. C l denotesthecompression levelas |V| / |V  | andequalsto1fornocompressionwhileto |V| forfullcompression. Figure3presentstheworkloadnetworksasgraph,hyper- graph,andcompressedhypergraph(with C l =0 . 5 )forthe transactionslistedinTableII. C.ProactiveTransactionClassi“cation Inconstructingtheclassi“cationtechnique,wearguethat therealwaysexistsagroupoftupleswhichareretrievedwhile processingtheDTs,andalsoparticipatedintheexecution ofnon-distributedbutfrequentlyoccurredtransactions.These particulargroupsoftupleswhenmoveintodifferentdatabase serversduetothedatabaserepartitioningprocesscanturn thepreviouslynon-DTsintonewlydistributedones.Weuse thisintuitivetoclassifytheworkloadtransactionsintothree TABLEII.S AMPLE W ORKLOAD Transaction DataTuples Class T 1 { 1,4,5,6,7,8,10 } DT T 2 { 1,4,6,9,11 } DT T 3 { 9,15,17 } MoveableNon-DT T 4 { 9,17 } MoveableNon-DT T 5 { 5,7,18 } DT T 6 { 15,17 } Non-moveableNonDT T 7 { 2,14,16 } Non-moveableNonDT differentcategories…distributed,non-distributedmoveableand non-distributednon-moveableasshowninFigure2.Asan example,transactions T 1 , T 2 ,and T 5 fromthesampleworkload ofTableIIareidenti“edas distributed ,whereas T 3 and T 4 are labelledas moveablenon-distributed .Finally, T 6 and T 7 are discardedaspurely non-distributed transactions. Clearly,anumberofnon-distributedmoveabletransactions willberemainprotectedwithin k -wayclusteringasthe min- cut clusteringalwaystriestopreserveasmuchastransactional edgesitcould.Asthetuplesinthesemoveabletransactions didnotparticipateintoanyDTs,theyareresidinginisolation withintheworkloadnetwork.Thus,theyarehighlylikelytobe preservedtogetherinthesameclusterafter k -wayclustering. AsanexampleshowninFigure3,thenon-distributedmove- abletransactions T 3 and T 4 containingtupleswithid9,15, and17remainprotectedasnon-distributedafterperforming k - wayclusteringwithallofthreeworkloadrepresentationsusing Metis[18]andhMetis[19]libraries. IfweaddedtheDTs T 1 , T 2 ,and T 5 intheworkloadsub- graphs,thenatthenextincrementalrepartitioningphase T 3 and T 4 wouldhavebeenappearedasDT.Since,tuplewithid9, whichbythistimewouldhavebeenalreadymovedtoanother partitionlocatedinadifferentphysicalserver,wouldcauseits associatedtransactionstobecomedistributed.Thereexistsa cleartrade-offbetweentheincreaseofsizeoftheworkload networksandachievedbene“ts.Atoneend,thesmalleris theworkloadnetwork,itwilllesscomputationallycostlyto processwithrespecttotimeandI/O.Ontheotherhand, ifweincludealltheworkloadtuplesintherepresentations, itmayreducetheimpactofDTbetterthaninaparticular repartitioningcycle,butwiththepriceofunwanteddata migrationstocreatenewDTs.Byaggressivelyclassifying the non-distributedmoveable transactions,thequalityofthe overallrepartitioningprocessincreasesastheimpactofDTs decreasescomparingtoastaticpartitioningstrategyasshown laterinourexperimentalresults. D. k -wayBalancedClusteringofWorkload Given G andamaximumallowedimbalanceratio  ,we cande“netheproblemas“ndthek-wayclustering  G = { V 1 ,...,V k } thatminimisestransactional edgecut withthe balance constraintboundsby (1+  ) .Similarly,the k -way constrainedandbalancedclusteringof H is  H = { V 1 ,...,V k } suchthatminimumnumberofhyperedgesarecuthavingthe imbalanceratio  .Analogously,the k -waybalancedclustering of H c is  H c = { V  1 ,...,V  k } withanimbalanceratio  aiming atminimumvirtualhyperedgecuts.Notethat,wedenote k as thetotalnumberoflogicalpartitionsinsteadofthenumber ofphysicalservers.Fromourempiricalexperimentswe“nd 217 C 1 C 2 C 3 C 4 P 1 0 2 2 0 P 2 1 1 0 2 P 3 0 0 0 1 P 4 2 1 0 0 C 1 C 2 C 3 C 4 P 1 0 2 2 0 P 2 1 1 0 2 P 3 0 0 0 1 P 4 2 1 0 0 C 1 C 2 C 4 C 3 P 4 2 1 0 0 P 1 0 2 0 2 P 2 1 1 2 0 P 3 0 0 1 0 C 1 C 2 C 3 C 4 P 1 2 2 0 0 P 2 1 2 1 0 P 3 0 0 0 1 P 4 0 0 2 1 C 1 C 2 C 3 C 4 P 1 2 2 0 0 P 2 0 2 1 0 P 3 1 0 0 1 P 4 0 0 2 1 C 1 C 2 C 3 C 4 P 1 2 2 0 0 P 2 0 2 1 0 P 4 0 0 2 1 P 3 1 0 0 1 C 1 C 2 C 3 C 4 P 1 0 2 1 0 P 2 1 0 2 2 P 3 0 0 1 0 P 4 0 0 0 3 C 1 C 2 C 3 C 4 P 1 0 2 1 0 P 2 1 0 2 2 P 3 0 0 1 0 P 4 0 0 0 3 C 4 C 2 C 3 C 1 P 4 3 0 0 0 P 1 0 2 1 0 P 2 2 0 2 1 P 3 0 0 1 0 S 1 ( 11) P 1 (4) P 3 (7) S 2 ( 9) P 2 (5) P 4 (4) 7 7 S 1 ( 13) P 1 (8) P 3 (5) S 2 ( 7) P 2 (2) P 4 (5) 3 5 S 1 ( 9) P 1 ( 4 ) P 3 ( 5 ) S 2 ( 11) P 2 ( 6 ) P 4 ( 5 ) 3 5 S 1 ( 10) P 1 (4) P 3 (6) S 2 ( 10) P 2 (5) P 4 (5) 6 11 S 1 ( 11) P 1 (7) P 3 (4) S 2 ( 9) P 2 (4) P 4 (5) 3 4 S 1 ( 11) P 1 ( 5 ) P 3 ( 6 ) S 2 ( 9) P 2 ( 4 ) P 4 ( 5 ) 3 6 S 1 ( 11) P 1 (3) P 3 (8) S 2 ( 9) P 2 (2) P 4 (7) 5 8 S 1 ( 8) P 1 (4) P 3 (4) S 2 ( 12) P 2 (5) P 4 (7) 2 4 S 1 ( 9) P 1 ( 4 ) P 3 ( 5 ) S 2 ( 11) P 2 ( 4 ) P 4 ( 7 ) 3 5 Random MC MSM Random MC MSM Random MC MSM 1 6 4 9 11 17 15 5 7 18 10 8 C 1 C 2 C 3 C 4 e 1 e 2 e 4 e 3 e 5 Hypergraph Graph 1 1 1 9 17 15 C 3 6 4 11 5 C 1 10 8 7 18 C 2 C 4 Compressed Hypergraph 8 v' 3 e' 1 C1 4 10 v' 5 C 2 e' 3 11 17 5 v' 6 9 15 v' 4 C 4 e' 2 e' 4 1 7 v' 2 6 18 v' 1 C 3 GR-R GR-MC GR-MSM HGR-R HGR-MC HGR-MSM CHG-R CHG-MC CHG-MSM 9 (Nine) Resulting Database Layouts Cluster-to-Partition Mapping Matrices Inter-server data migrations Inter-partition data migrations Fig.3.Transactionalworkloadmodellingwith3representationsalongwith4-waymin-cutclustering,followedby3cluster-to-partitionmappingstrategies. thatexecutingthe k -wayclusteringprocessingwith k asthe numberofpartitionsprovide“nergranularityinbalancingthe distributionofdatavolumeoverthesetofphysicalservers. The k -waybalancedclusteringgeneratesclustersofsimilar sizewithrespecttothenumberoftuplesgivena balance constraintwhichisde“nedas k max( W ( V i )) /W ( V ) ,andtells whethertheclustersareequally-weightedornot.Here, W ( V i )) isthesumoftheweightsoftheverticesin V i .Thepartitions aresaidtobebalancedifthe balance measureisequalstoor closeto1otherwiseimbalancedifgreaterthan1. E.Cluster-to-PartitionMappingStrategies Figure3presentsthreedistinctivecluster-to-partitionmap- pingstrategies(inmatrixformat)beneaththeirrespective workloadnetworkrepresentations.Therowsandcolumnsof thematricesrepresentpartitionandclusteridrespectively. Individualmatrixelementrepresentstuplecountsfroma particularpartitionwhichisplacedbytheclusteringlibraries underaspeci“cclusterid.Theshadowedlocationsinthe mappingmatrixwiththecountsin boldface representsthe resultingdecisionblockwithrespecttotheparticularcluster andpartitionid.Individualtablesbelowthematricesrepresent thestateofthephysicalandlogicallayoutsofthesample database.Thelastrowofthesetablesrevealsthecountsof inter-andintra-serverdatamigrationsforeachofthesenine representativedatabaselayouts.The boldface numbersinthe layouttablesatbottomdenotemostbalanceddistributionand leastcountfordatamigrations.Inbelowweexplainthemain philosophiesbehindthesemappingstrategiesindetail. 1)Random(R)Cluster-to-PartitionMapping: Naturally, thebestwaytoachieveloadbalanceinanygranularityis toassigntheclustersrandomly.ClusteringtoolslikeMetis andhMetisrandomlygeneratestheclusterids,anddonot haveanyknowledgeabouthowthedatatuplesareoriginally distributedwithintheserversorpartitions.Asastraightforward approach,theclusteridscanbesimplymappedone-to-oneto thecorrespondingpartitionidastheyaregenerated.Although, thisrandomassignmentbalancestheworkloadtuplesacross thepartitionsitnotnecessarilyguaranteesminimuminter- serverdatamigrations.AsshowninFigure3,themapping matriceslabelledwith Random anddatabaselayoutswith GR- R , HGR-R and CHG-R aretherepresentativesofthisclass. 2)Max-Column(MC)Mapping: Weaimatminimisingthe physicaldatamigrationwithintherepartitioningprocessusing thisstrategy.Inthecluster-to-partitionmappingmatrixthe maximumtuplecountofanindividualcolumnisdiscovered, andtheentireclustercolumnismappedtotherepresented partitionidofthatmaximumcount.Thus,multipleclusters canbeassignedtoasinglepartition.Asmaximumnumbersof tuplesareoriginatedfromthisdesignatedpartitiontherefore theydonotmovefromtheirhomepartitionwhichreduces theoverallinter-serverphysicaldatamigrations.ForOLTP workloadswithskewedtupledistributionsanddynamicdata popularity,theimpactofDTscanrapidlydecreasefromthis greedyheuristicastuplesfrommultipleclustersmaymapto asinglepartitioninthesamephysicalserver.However,this directlyleadstodatavolumeimbalanceacrossthepartitions andservers.Mappingmatriceslabelledas MC withcorre- spondingdatabaselayoutsof GR-MC , HGR-MC ,and CHG- MC representthismappingstrategyinFigure3. 3)Max-Sub-Matrix(MSM)Mapping: Tobothminimise loadimbalanceanddatamigrations,weforkliftthenaturalad- vantagesofthepreviousstrategiesandcombinethemtogether. At“rst,thelargesttuplecountswithintheentiremapping matrixarefoundandplacedatthediagonallytopleftposition byperformingsuccessiverow-columnrearrangements.The nextphasebeginsbyomittingtheelementsinthe“rstrow andcolumnthenrecursivelysearchtheremaining sub-matrix forelementwithmaximumtuplecounts.Finally,allthe diagonalpositionsofthematrixare“lledupwithelements havingmaximumtuplecounts.Now,mappingtherespective clustersone-to-onetothecorrespondingpartitionsresultsboth minimumdatamigrationsanddistributionloadbalance.Note 218 TABLEIII.C OMPARISONOF S ERVERAND P ARTITION - LEVEL balance Method S balance P balance GR-R 1.1 1.4 GR-MC 1.3 1.6 GR-MSM 1.1 1.2 HGR-R 1.0 1.2 HGR-MC 1.1 1.4 HGR-MSM 1.1 1.2 CHG-R 1.1 1.6 CHG-MC 1.2 1.4 CHG-MSM 1.1 1.4 that,multiplemaximumtuplecountscanbefoundindifferent matrixpositions,andthe“rstsuchencounteredelementis chosenforsimplicity. The MSM strategyworkssimilarlytothe MC strategy asitprioritisesthemaximumtuplecountswithinthesub- matrices,andmaptheclustersone-to-onetothepartitions likethe Random mappingstrategythuspreventingpotential loadimbalanceacrossboththelogicalpartitionsandphysical servers.InFigure3,mappingmatriceslabelledas MSM , andrepresentativedatabaselayouts GR-MSM , HGR-MSM ,and CHG-MSM depictthismappingstrategy. F.TheBalanceMeasure Forillustrationpurpose,wereusethesame balance mea- sure[15]mentionedearlierwhileusingtheserverandpartition weightsinsteadofclusterweights.Considering GR-R database layoutasshowninFigure3,therearetotal20tuplesdistributed amongtwophysicalservers( S 1 ,S 2 )andfourlogicalpartitions ( P 1 ,...,P 4 ).Theseserverscontain11and9tupleswhilethe partitionscontain4,5,7,and4tuplesrespectivelywhichleads toa balance valueof (2 × 11) / (11+9)=1 . 1 attheserver- leveland (4 × 7) / (4+5+7+4)=1 . 4 atthepartition- level.TableIIIpresentsthecalculated balance measurefor theseentireninedatabaselayoutsinbothserverandpartition- levelwhere boldface valuesindicatelowest balance measure. Inoverall, GR-MSM and HGR-MSM performbetterthanall othersprimarilyintermsofminimumdatamigrationsandload balance.Fromthiselaborateillustration,itisclearthat k -way min-cutclusteringoftheworkloadnetworkacrosspartitions givesbetterestimationofloadbalance,and“nerdegreesof freedomfordifferentcluster-to-partitionstrategiestominimise intra-andinter-serverphysicaldatamigrations. G.DistributedDataLookup AsmentionedinSectionIandIII,anycentralisedlookup mechanismisalwaysatrisktobethebottleneckinachieving high-availabilityandscalabilityrequirements.Wetakeaso- phisticatedapproachtodistributethedatatuplelookupprocess intoindividualdatabasepartitionlevel.Thus,datamigration operationsaretotallytransparenttodistributedtransaction processingandcoordination.Bymaintainingakey-valuelistof roaming and foreign dataidwiththeircorrespondingpartition id,individualpartitionscananswerthelookupqueries.Tuples areassignedpermanent home partitionidforitslifetime whenthedatabaseisinitiallypartitionedusingrange,hash, orconsistenthash[16]. Home partitionidonlychangeswhile apartitionsplitsormergesandtheseoperationsareoverseen bythe coordinators asshowninFigure1,thustransparentto thelookupprocess.Asthetuplelocationsaremanagedby their home partitions,datainconsistencyarestrictlyprevented. Unlessatupleisfullymigratedtoanotherpartition,andits roaming locationiswritteninthecatalogue,theoldpartition continueservingtransactionalprocessing. Whenatuplemigratestoanotherpartitionwithinthe processofincrementalrepartitioning,onlyitsrespective home partitionneedstobeawareofit.Thetarget roaming partition willtreatthismigratedtupleasa foreign andupdatesits lookuptableaccordinglywhereastheoriginal home partition willmarkthistupleas roaming initslookuptableandupdate itscurrentlocationwiththe roaming partitionsid.Alookup processalwaysquerythetuples home partitiontoretrieve it.Ifthetupleisnotinitiallyfoundinitsoriginallocation, thelookuptableentrythusimmediatelyinformsthemost recentlocationofthetupleandredirectthesearchtowardsthe roaming partition.Thus,amaximumoftwolookupoperations canberequiredto“ndatuplewithintheentiredatabase. Notethat,thecostofphysicaldatamigrationmayincrease whileusingsuchdistributedlookupprocess.Withahighprob- abilityindividualdatamigrationsintheincrementalrepartition processmayinvolverunninglocationupdateprocessupto threephysicalserversservingthe home partitionandtwo roaming partitions„currentandtargetpartitions.Atpresent, weareinvestigatingtheimplicationofthiscost,andhowto includethisintheformulationofqualitymeasure. H.QualityMeasureforIncrementalRepartitioning Inevaluatingtheperformanceoftheincrementalreparti- tioning,previousworks[4],[5]onlymeasurethepercentageof reductioninDTs.However,thissinglemeasurefailstoimply anymeaningconclusionabouthowtheimpactofdistributed transactionisminimised.Further,therearenomeasuresfor overallloadbalanceanddatamigrations.Weproposethree independentmetricstomeasurethesuccessiverepartitioning qualityachievingthreedistinctobjectives…1)minimisethe impactofDTs;2)minimiseloadimbalance;and3)minimise thenumberofphysicaldatamigrations. The“rstmetricmeasurestheimpactwithinascaleof0 to1associatingthefrequencyofDTsandtheirrelatedthe costofI/O.Thesecondmetricmeasuresthetuple-levelload distributionoverthesetofserversusing coef“cientofvariation whicheffectivelyshowsthedispersionofdataloadoversuc- cessiveperiodofobservations.Thethirdmetricmeasuresthe meaninter-serverdatamigrationsforsuccessiverepartitioning processes.Bycombiningallthreeaforementionedmentioned metrics,acompositemetricisalsoproposedwhichrepresents themixofworkloadrepresentationandcluster-to-partition mappingstrategyforaparticularincrementalrepartitioning cycletoachieveacertainobjective.Inthefollowing,wemodel thesethreerepresentativemetricsindetail. 1)TheImpactofDistributedTransaction: Consideringthe formal de“nitionsprovidedinSectionIV-A,wecombinethe costofspanningmultiplephysicalserverbyanydistributed transaction t ,i , cost ( t ,i ) withthefrequencyof t ,i within W i , freq ( t ,i ) .Here, cost ( t ,i )= S t ,i = { v  t : a |  a  b  cd a,b,c = v } whichdenotesthenumberofphysical serversinvolvedinprocessing t ,i ,whereas, cost ( t ,i )=1 for 219 anynon-distributedtransaction t ,i .Notethat,inrealitythis costrepresentstheoverheadofI/Ooverthenetworkwhile processingtheDTs.Equation1belowde“nesthespanning costof T  within W i forall t ,i  T  cost ( T  )=   t ,i  T  cost ( t ,i ) freq ( t ,i ) (1) Similarly,(2)denotes cost ( T  ) forall t ,i  T  cost ( T  )=   t ,i  T  freq ( t ,i ) (2) Finally,theactualimpactof T  canbede“nedas: I d = cost ( T  ) cost ( T  )+ cost ( T  ) (3) 2)LoadBalance: Themeasureofloadbalanceacrossthe physicalserversisdeterminedfromthegrowthofthedata volumewiththesetofphysicalservers.Ifwecomputethe standarddeviationofdatavolume  D S forallthephysical servers,then,thevariationofdistributionoftupleswithin theserverscanbeobserved.Thisisequivalenttowhatwe discussinSectionIV-Fasthe balance measure.Thecoef“cient ofvariation( C v )de“nestheratiobetween  D S and µ D S forall S underdeployment,andindependentoftheunitof measurement. C v cantellthevariabilityoftupledistribution withintheserversinrelationtothemeandatavolume µ D S . Equation4belowdeterminesthe C v oftheloadbalance measurefortheentireclusteratanyinstanceofobservation. L b =  D S /µ D S (4) where µ D S = 1 n n  i =1 D S i and  D S =  1 n n  i =1 ( D S i Š µ D S ) 2 3)Inter-ServerDataMigrations: Foranygiven W i ,the totalofinter-serverdatamigrationswithin  i canbenormalised bydividingwiththemeandatavolume µ D S .Asshownin(5), D m measuresthequalityofinter-serverdatamigrationwith respecttothegivenworkload W i . D m = M v /µ D S (5) where M v isthetotalnumberofmigrationsduringthecurrent observationwindow. I.CompositeMetric( C m ) Let, C m bethecompositemetricwithweightfactors  I d ,  L b ,and  D m respectivelyfortheobjectivemeasures I d , L b , and D m where  I d +  L b +  D m =1 providingtwodegrees offreedomtochoosebetweendifferentrepartitioninggoals. Besides, I d , L b ,and D m arefurthernormalisedby 0-1nor- malisation foruni“cationpurpose.Giventheapplicationand systemrequirements,systemadministratorscansetspeci“c goaltowardsachievingcertainqualityobjectives… minimise I d , L b ,or D m fortheincrementalrepartitioningprocess.Based ondifferentweightdistributions,itisthuspossibleto“nd arepartitioningsweetspotpreferringparticularchoicesof workloadnetworkrepresentationandcluster-to-partitionmap- pingstrategy.Thus,by“netuningthecombinationsinweight distributiononecaninstantlytackleunpredictablesituationsby tweakingthedirectionofincrementalrepartitioningprocessto maintainacceptableleveloftransactionalservices.Wede“ne the C m accordingtothefollowingequation: C m =  I d I d +  L b L b +  D m D m (6) V.E XPERIMENTAL R ESULTS Tounderstandtheeffectivenessoftheproposedideas, webuiltaworkload-drivensimulationframeworkforthe distributeddatabasesystempresentedinFigure1.Weevaluate ourproposedmethodsagainstastaticpartitioningframework implementingthethreeworkloadnetworkrepresentations… graph,hypergraph,andcompressedhypergraphusing random cluster-to-partitionmappingstrategy.Aworkload-awarestatic con“gurationredistributeworkloadtupleonlyonceanddoes notconsidersubsequentchangesintransactionalpro“le.This exhibitstheworst-casescenarioforanincrementalreparti- tioningframework,andweconsideritasthebaselineof comparison.Tuple-levelreplicationisnotuseinthesesettings asdiscussedinSectionIandIV. 12independentdatabasesarecomparedusing3work- loadnetworkrepresentationsincombinationof3cluster-to- partitionmappingstrategies.Morespeci“cally,wecompare thedatabases GR-R , GR-MC , GR-MSM , HGR-R , HGR-MC , HGR-MSM , CHG-R , CHG-MC ,and CHG-MSM asdescribed throughSectionIV-BandIV-Eagainstthestaticpartitioning frameworkforallofthethreeindividual qualitymeasures . Ourgoalistoevaluatetheeffectivenessoftheproposed techniqueswithrespectto… I d , L b ,and D m (asdetailedin SectionIV-H)forincrementalrepartitioningcycles.Hence,we donotcomparetheresultsagainstthe performancemeasures liketransactionalthroughputandlatency. A.ExperimentalSetup Weusearealisticexperimentalsetupofadistributed databaseasdepictedinFigure1,andusethepopularTPC- Ctransactionalworkloadsdevelopedinourworkload-driven simulator.AtypicalTPC-Cdatabasecontains9tables,5trans- actionsandsimulatesanorderprocessingtransactionalsystem withingeo-distributeddistrictsandassociatedwarehouses. AmongtheseStockandOrder-Linetablesareexceptionally fatinvolume,andthusallthelogicaldatabasepartitionsare nothomogeneousinsize.NewtuplesareinsertedintoOrder andOrder-LinetablesusingtheNew-OrderTransaction whichusuallyoccupiesnearly44.5%oftheworkload. Fixednumberoftransactionsaregeneratedunder5trans- actiontypesineachworkloadbatchhavinga“xedbirthand deathrate.Wefurtherhashpartitionthese9databasetables bytheirprimaryids,andplacetheminto10datanodeservers havingatotalof90logicalpartitions.Notethat,itispossible tohashpartitionTPC-CtablesbyWarehouseidtobalance theworkloaddistribution,howeverweintentionallyavoidthis toexhibittheworstcasescenarioofDTsinapopularOLTP benchmark.The“vetypesoftransactionsareweightedfrom heavytolightintermsoftransactionalprocessing,andthey occurinhightolowfrequencies.Thesyntheticdatageneration processfollowsZipfsdistributionforgeneratingWarehouse andItemtables,andusethedatabaserelationshipconstraints togenerateothers.WeuseMetis[18]andhMetis[19] k - waymin-cutclusteringlibrarieswiththeirdefaultsettings. Theentiresimulationprocessrunsfor10timeshaving100 220 Impact of distributed transactions Serverlevel load balance Interserver data migrations 0.00 0.25 0.50 0.75 1.00 GR HGR CHG GR HGR CHG GR HGR CHG Normalised I d , L b , and D m Strategies Static Partitioning Random (R) MaxColumn (MC) MaxSubMatrix (MSM) Fig.4.Individualeffectsof I d , L b , D m inincrementalrepartitioningcomparingwiththestaticpartitioningschemeunderdifferentworkloadrepresentations. incrementalrepartitioningcyclesineachrunforallthe12 representativedatabasecon“gurations,andthenaveraged. B.ResultAnalysis 1)IndependentMeasureof I d , L b ,and D m : Figure4 presentsthecomparativeresultsoftheindependentrepartition- ingqualitymeasuresforthe12databasecon“gurationsunder test.Resultsareshownusingboxplotswithsmallredcircles denoting mean values.Individual qualitymeasures fordifferent settingsaregroupedtogetherintoseparateboxes.Withineach box,4differentstrategies…1) StaticPartitioning ,followedby3 IncrementalRepartitioningapproachesusing2) Random(R) , 3) Max-Column(MC) ,and4) Max-Sub-Matrix(MSM) cluster- to-partitionmappingstrategies(explainedinSectionIV-E)are compared.Weexpectthe StaticPartitioning schemetoexhibit theworst-casescenariosforalltheindividualmetricsasit performtheworkload-awaredataredistributiononlyonce,and donotrunfortherestoftheremaining99cycles.Asshown inFigure4,thevaluesof I d aretoohighforalltheworkload representations,loadbalancevarieswithinawiderangeof L b , andmeandatamigrationsarealmostzero. Inevaluating I d using(3),databaseswith MC based mappingstrategiesoutperformallothersduetoaggressive datamigrationsandmany-to-onecluster-to-partitionmapping. Fortheveryreason,datavolumedistributionsof GR-MC , HGR-MC ,and CHG-MC leadtocompleteimbalanceover the100repartitioningcyclesineachrun. MSM databases performssomehowsimilarcomparingtothe Random strategy implementations.Although graph and compressedhypergraph based GR-MSM and CHG-R performbetter,butinoverallboth HGR-R and HGR-MSM showgoodresultswithoutleaving feworanyoutliersatall.Whilecomparingtheresultsof L b (4),both Random and MSM baseddatabasesperform well.Although, HGR-MSM showsmuchstableresultsin comparisontoothers, GR-MSM winsoverallofthem.This supportsourintuitionthatrandomnessandone-to-onecluster- to-partitionmappingcannaturallybalancedatadistributions acrossthedatabasecluster.Inter-serverdatamigrations( D m using(5))arespeci“callylowwiththe MC databases,however, GR-MSM and HGR-MSM bothperformsreliablyintermsof I d , L b ,and D m oversuccessiverepartitioningcyclesshowing theeffectivenessofourproposedtechniques.In CHG based con“guration, CHG-MC performsbetterthan HGR-MC forall thequalitymeasures,however, HGR-MSM outperforms CHG- MSM inallaspects.Whilerankingthemeasuresofthese threeindividualmetricsaccordingly, GR-MSM winsoverall otherfollowingby CHG-MC and GR-MC ,while HGR-MSM achievesfourthplace.Bothofourproposedcluster-to-partition mappingstrategiesperformssigni“cantlybetterthanthe Ran- dom con“gurationintermsofminimisinginter-serverdata migrationsandloadimbalance.Inoverall,thebaseline… Static Partitioning schemeisfoundineffectiveinhandlingdynamic OLTPworkloads,andjusti“esourcomprehensivestudieswith IncrementalRepartitioning havingdifferentcombinationsof workloadrepresentationsandmappingstrategies. 2)CombinedEffectUsingCompositeMetric, C m : To understandthecombinedeffectof I d , L b ,and D m throughthe compositemetric C m using(6),weusedifferentcombinations oftherespectedweightfactorsprovidingthat  I d +  L b +  D m =1 .Figure5showstheresultingmeasureof C m in a2-dperspectiveplotusingcolouredscalewhere L b and I d areplottedintheX-axisandY-axisrespectively.Thelocations presentingthevaluesof D m canbedeterminedbycalculating 1 Š (  I d +  L b ) intheindividualsubplots.Wecansetspeci“c preferencestoprioritiseoneparticularrepartition qualitymea- sure overother.Individualextremesof I d , L b ,and D m can befoundat(1,0),(0,1),and(0,0)locations.Byfollowing thecolourcodesfromthelegend,onecaneasilyidentifyhow individualrepartitioningobjectiveswouldbemet.Fromthe plots,asanticipatedinSectionIV, MC baseddatabasesdo notfavour L b whiledramaticallyreducing I d incontrary.We canalsoidentifytherepartitioningchoicesforgeneral-purpose OLTPapplicationas GR-MSM and GR-R followedby CHG- MSM and CHG-R ,whileallofthe HGR basedsettingsare highlytunabledependingontherepartitioningobjectivesin responsetodifferentadministrativesituations.Akeyobserva- tionhereisthat,thechoicesofworkloadrepresentationand mappingstrategyarenotboundedtoanyspeci“ccombination. Tocon“rmthis,wealsoconducttwo-wayANOVAtestand 221 GRR GRMC GRMSM HGRR HGRMC HGRMSM CHGR CHGMC CHGMSM 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.000.250.500.751.00 0.000.250.500.751.00 0.000.250.500.751.00 Normalised L b Normalised I d 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 C m Fig.5.Combinedeffectof I d , L b ,and D m throughcompositemetric C m . Notethat,lowervaluesof C m indicatebettersolutions. analysetheinteractionplots.However,wedidnot“ndanytrue evidenceofinteractionsbetweenthechoicesofrepresentation andmappingstrategy.ResultsfromANOVAtablealsosupport this“nding.Theseseriesofobservationsstronglysupportour argumentspresentedinSectionIIIandIV,andjusti“esthe goalofsensitivityanalysiswithinabroaddesignspace,which, tobestofourknowledgewasnotdonebefore. VI.C ONCLUSIONAND F UTURE W ORKS Inthispaper,wepresentaworkload-awareincremental repartitioningframeworkforOLTPdatabaseswhichminimises …1)theimpactofDTsusingthe k -waybalancedmin-cutclus- tering;2)theoverallloadimbalancethroughtherandomness oftheone-to-onecluster-to-partitionmappingstrategies;and 3)thephysicaldatamigrationsbyapplyingheuristics.Our innovative transactionclassi“cation techniqueensuresglobal minimisationinoverallloadimbalanceanddatamigrations comparingtotheworst-casescenarioofaStaticPartitioning frameworkimplementingrandomcluster-to-partitionmapping fordifferentworkloadrepresentations.The elaboratemod- elling approachclearlyidenti“estheinter-relatedgoalswithin therepartitioningprocess,andprovideseffectiveheuristicsto achievethembasedonoperationalrequirements.Byadopt- ingtheconceptof roaming ,theproposeddistributeddata lookuptechniquetransparentlydecentraliselookupoperations fromthedistributedtransactioncoordinatorguaranteeinghigh- scalability.Ourphilosophicalargumentsbroadenthedecision spacewithcomprehensive sensitivityanalysis bycombining differentworkloadrepresentationsandmappingstrategies.The proposedsetof qualitymetrics presentsasophisticatedway tomeasurethequalityofsuccessiverepartitioning,andour simulationresultsoutperformtheStaticPartitioningstrategies inachievingindividualrepartitioningobjectives.Theuseof compositemetric showsaneffectivewayofoperationalintelli- genceforCloudapplicationssufferingfromdynamicworkload behaviours.Atpresent,weareinvestigatingthefollowings asourfuturedirection…incrementalrepartitioningenabling replicationinasinglestep,enforcedifferent balance criteria inthecluster-to-partitionmappingheuristics,andincludethe costofdatamigrationsinmodellingthe quality measures. R EFERENCES [1]IBMDataHub,Žwww.ibmbigdatahub.com/infographic/four-vs-big- data,[Online]. [2]L.Gu,D.Zeng,P.Li,andS.Guo,Costminimizationforbig dataprocessingingeo-distributeddatacenters,Ž IEEETransactionson EmergingTopicsinComputing ,vol.99,no.PrePrints,2014. [3]R.Johnson,I.Pandis,andA.Ailamaki,Eliminatingunscalablecom- municationintransactionprocessing,Ž TheVLDBJournal ,vol.23,no.1, pp.1…23,Feb.2014. [4]C.Curino,E.Jones,Y.Zhang,andS.Madden,Schism:aworkload- drivenapproachtodatabasereplicationandpartitioning,Ž Proceedings oftheVLDBEndowment ,vol.3,no.1-2,pp.48…57,Sep.2010. [5]A.Quamar,K.A.Kumar,andA.Deshpande,SWORD:scalable workload-awaredataplacementfortransactionalworkloads,Žin Pro- ceedingsofthe16thInternationalConferenceonExtendingDatabase Technology ,ser.EDBT13.NY,USA:ACM,2013,pp.430…441. [6]Vitess…scalingMySQLdatabasesforlargescalewebservices,Ž https://github.com/youtube/vitess,[Online]. [7]MySQLtoolkitformanagingbillionsofrowsandhundredsofdatabase machines,Žhttps://github.com/tumblr/jetpants,[Online]. [8]A”exibleshardingframeworkforcreatingeventually-consistentdis- tributeddatastores,Žhttps://github.com/twitter/gizzard/,[Online]. [9]M.MehtaandD.J.DeWitt,Dataplacementinshared-nothingparallel databasesystems,Ž TheVLDBJournal ,vol.6,no.1,pp.53…72,Feb. 1997. [10]A.Pavlo,C.Curino,andS.Zdonik,Skew-awareautomaticdatabase partitioninginshared-nothing,paralleloltpsystems,Žin Proceedingsof the2012ACMSIGMODInternationalConferenceonManagementof Data ,ser.SIGMOD12.NY,USA:ACM,2012,pp.61…72. [11]T.Ra“q,Elasca:Workload-awareelasticscalabilityfor partitionbaseddatabasesystems,ŽMastersthesis,University ofWaterloo,Canada,May2013.[Online].Available: http://uwspace.uwaterloo.ca/handle/10012/7525 [12]B.P.Swift,Dataplacementinascalabletrans- actionaldatastore,ŽMastersthesis,VrijeUniversiteit, Amsterdam,Netherland,Feb.2012.[Online].Available: http://www.globule.org/publi/DPSTDS master2012.pdf [13]RoaminginGSMNetwork,Žhttp://en.wikipedia.org/wiki/Roaming, [Online]. [14]J.M.Pujol,V.Erramilli,G.Siganos,X.Yang,N.Laoutaris,P.Chhabra, andP.Rodriguez,Thelittleengine(s)thatcould:Scalingonlinesocial networks,Ž IEEE/ACMTransactionsonNetworking(TON) ,vol.20, no.4,pp.1162…1175,Aug.2012. [15]A.Turk,R.O.Selvitopi,H.Ferhatosmanoglu,andC.Aykanat, Temporalworkload-awarereplicatedpartitioningforsocialnetworks,Ž IEEETransactionsonKnowledgeandDataEngineering ,vol.99,no. PrePrints,2014. [16]D.Karger,E.Lehman,T.Leighton,R.Panigrahy,M.Levine,and D.Lewin,Consistenthashingandrandomtrees:distributedcaching protocolsforrelievinghotspotsontheworldwideweb,Žin Proceedings ofthetwenty-ninthannualACMsymposiumonTheoryofcomputing , ser.STOC97.NY,USA:ACM,1997,pp.654…663. [17]MobileIP,Žhttp://en.wikipedia.org/wiki/Mobile IP,[Online]. [18]G.KarypisandV.Kumar,Multilevelk-waypartitioningscheme forirregulargraphs,Ž JournalofParallelandDistributedComputing , vol.48,no.1,pp.96…129,Jan.1998. [19]„„,Multilevelk-wayhypergraphpartitioning,Žin Proceedingsofthe 36thAnnualACM/IEEEDesignAutomationConference ,ser.DAC99. NY,USA:ACM,1999,pp.343…348. 222