/
Consistent MainMemory Database ederations under Deferr ed Disk Writes Rodrigo Schmidt Consistent MainMemory Database ederations under Deferr ed Disk Writes Rodrigo Schmidt

Consistent MainMemory Database ederations under Deferr ed Disk Writes Rodrigo Schmidt - PDF document

cheryl-pisano
cheryl-pisano . @cheryl-pisano
Follow
537 views
Uploaded On 2014-12-18

Consistent MainMemory Database ederations under Deferr ed Disk Writes Rodrigo Schmidt - PPT Presentation

schmidtepflch fernandopedoneunisich Abstract Curr ent cluster ar hitectur es pr vide the ideal en vir on ment to run feder ations of mainmemory database sys tems FMMDBs In FMMDBs data esides in the main memory of the feder ation server s signi64257ca ID: 25538

schmidtepflch fernandopedoneunisich Abstract Curr ent

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Consistent MainMemory Database ederation..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

ConsistentMain-MemoryDatabaseFederationsunderDeferredDiskWritesRodrigoSchmidt?;yFernandoPedoney?ÉcolePolytechniqueFédéraledeLausanne(EPFL),CH-1015Lausanne,SwitzerlandyUniversitàdellaSvizzeraItaliana(USI),CH-6904Lugano,SwitzerlandE-mails:rodrigo.schmidt@epfl.ch,fernando.pedone@unisi.chAbstractCurrentclusterarchitecturesprovidetheidealenviron-menttorunfederationsofmain-memorydatabasesys-tems(FMMDBs).InFMMDBs,dataresidesinthemainmemoryofthefederationservers,signicantlyimprov-ingperformancebyavoidingI/Oduringtheexecutionofreadoperations.Tomaximizetheperformanceofupdatetransactionsaswell,someapplicationsrecurtodeferreddiskwrites.Thismeansthatupdatetransactionscom-mitbeforetheirmodicationsarewrittenonstablestor-ageanddurabilitymustbeensuredoutsidethedatabase.WhiledeferreddiskwritesincentralizedMMDBsrelaxthedurabilitypropertyoftransactionsonly,inFMMDBstransactionatomicitymaybealsoviolatedincaseoffailures.Weaddressthisissuefromtheperspectiveoflog-basedrollback-recoveryindistributedsystemsandprovideanefcientsolutiontotheproblem.Keywords:dependencytracking,consistency,rollback-recovery,distributedtransactions,MMDBs.1.IntroductionContinuoustechnologyimprovementshavereducedthecostandboostedtheperformanceandmemoryca-pacityofcommoditycomputers.Asaconsequence,powerfulcomputerclustersarebecomingincreasinglyaffordableandcommon.Thesearchitecturesprovidetheidealenvironmentformechanismstargetinghigh-performancecomputingsuchasmain-memorydatabasesystems(MMDBs[11]).Althoughoriginallydesignedforspecicclassesofapplications(e.g.,telecommuni-cation)runninginsingleservers,recentworkhassug-gestedthatMMDBscanbealsousedinbroadercon-texts(e.g.,webservers[18])andenvironments(e.g.,clusteredarchitectures[24]).Shortly,MMDBsover-comethelatencylimitationsoftraditionaldisk-baseddatabasesbystoringthedataitemsinthemainmem-oryoftheservers[12].ByavoidingdiskI/O,bothTheworkpresentedinthispaperhasbeenpartiallysupportedbytheHaslerFoundation,Switzerland(project#1899).transactionthroughputandresponsetimecanbeim-proved.Moreover,astransactionsdonothavetowaitfordatatobefetchedfromdisk,concurrencybecomeslessimportantforperformanceandsomeapproacheshaveconsideredloweringtheoverheadoftransactionsynchronizationbyreducingconcurrency(e.g.,lockingtablesinsteadofrows,executingtransactionssequen-tially[11,15]).Forrecoveryreasons,MMDBsalsokeepacopyofthedatabaseindisk.Queriesexecuteentirelyusingdatainmainmemory,butupdatetransactionshavetomod-ifythestateindisk.Infact,accessingthediskisthemainoverheadincurredbyupdatetransactionsexecut-inginanMMDB.Tomaximizetheperformanceinsuchcases,someapplicationsrecurtodeferreddiskwrites.Thismeansthatupdatetransactionscommitbeforetheirmodicationsarewrittenonstablestorage.Sincediskaccessisdeferreduntilaftertransactionscommit,vari-oustransactionlogscanbegroupedandasynchronouslywrittenatonceondisk.Thisapproachaloneharmsthedurabilitypropertyoftransactions,butsomeap-plicationsmayprefertoensuredurabilityoutsidethedatabaseforperformancereasons.Asanexampleofsuchapplications,databasereplicationschemesbasedonatomicbroadcastprimitives(e.g.,thedatabasestatemachineapproach[19])inthecrash-recoverymodelwillhavedurabilityensuredbythegroupcommunicationprimitive(seetheworkin[22])and,therefore,itisre-dundanttoalsohaveitineachdatabasereplica.Thispaperconsidersafederationofmain-memorydatabasesystems(FMMDB)wheredataispartitionedamongdifferentserversrunninglocalMMDBs.Globaltransactionterminationisimplementedbyatomicallygroupingthecommitdecisionofvariouslocalsub-transactions.Asinacentralizeddatabase,applicationscanchoosetousedeferreddiskwritesinordertoim-provesystem'sperformance.Deferreddiskwrites,how-ever,introduceadditionalcomplexitiesinanFMMDB.Inasingle-serversystem,onlythedurabilityproperty1 maybeviolatedincaseofdatabasecrash—thishappensaslongaslogwritesrespectthecommitorderoftheirrespectivetransactions.Bycontrast,inafederationacrashmayrenderaserverinconsistentwithrespecttotheothers,compromisingatomicityaswell.Considerasimplefederationcomposedoftwodatabaseservers.Ifatransactiontupdatesdatainbothservers,commits,andoneoftheserverscrashesbeforemakingtheupdateslocallypersistent,whenthefailedserverrecoversfromthefailure,itwillhaveforgottent'slocalexecution.Inthiscase,atomicityisviolatedbythefactthatonlypartoftpersists:theoneintheserverthatdidnotcrash.WeaddresstheproblemofdeferreddiskwritesinafederationofMMDBsusinganovelapproachthatbor-rowsfromthetheoryofrollback-recoveryindistributedsystems[9].Thebasisofthistheoryistheidenticationofdependenciesbetweenprocessstates.Thisallowstherecognitionofconsistentglobalstates(i.e.,thosecom-posedoflocalstatessuchthatnoonedependsontheother)towhichtheapplicationshouldberolledbackincaseoffailure.Efcientlyapplyingtheseresultsinthecontextoftransactionprocessingsystems,however,isnotstraightforwardandrequiresrevisitingtheoriginaltheory.Transactionprocessingsystemscreatedepen-denciesbetweendatabasestatesdifferentlyfromusualmessage-passingdistributedsystems.Inthelatter,de-pendenciesarebasedoncausality1;intheformer,de-pendenciesarecreatedbyreadandwriteoperationsondatabaseobjectsduringtheexecutionoftransactions.Consider,forexample,asimpledistributedtransac-tionexecutioncomposedoftwoserversandoneclient.Twotransactionsexecutesequentially:t1andt2.Fig-ure1depictstheexecutionwherereadrequestsarede-notedbyR,writerequestsbyW,andcommitrequestsbyC; , ,and\rrepresentthedatabasestatesattheservers.DatabaseserverSichangesitsstateafteranupdatetransactioncommitsatSi;thestateremainsthesameifthetransactiononlyreadsthelocalstateoraborts.Inausualmessage-passingsystem,state wouldprecede\rsincethereisacausalpathbetweenthetwostates(depictedinboldinFigure1).However,sincet1onlyreads ,itturnsoutthat and\rareinfactcon-current.Thisexampleshowsthatcausalityisactuallytoostrongtocapturedatabasestatedependencies,andamoreappropriateformalismisneeded.Werevisittheoriginaldependencydenitions,devel-opedformessage-passingsystems,andproposeanewonebasedondatabasestates,minimalfordistributed1Eventecausallyprecedese0iff(i)theyexecuteinthesamepro-cess,ebeforee0,or(ii)ereferstothesendingofamessageande0referstoitsreceipt,or(iii)eande0arerelatedbythetransitiveclosureofthetwopreviousconditions[17].PSfragreplacementsS1S2C1RRWCC \r t1t2Figure1.False(causal)dependencytransactionenvironmentsandallowingefcienttrackingimplementation.Moreover,thispaperillustratestheap-plicabilityofourapproachinthecontextofanFMMDBwithdeferreddiskwrites.Oursolutionisoptimisticinthesensethatwedonotforceserverstosynchronizetheiraccessestodisk(e.g.,usingatwo-phasecommit-likeprotocol),buttrackdependenciesbetweendatabasestatesduringnormalexecutionand,incaseoffailure,bringthesystemtoaconsistentstateduringrecovery.Thispaperisstructuredasfollows.Section2intro-ducesourcomputationalandexecutionmodels.Sec-tion3exploresconsistencyanddependenciesinatrans-actionalsystem.Section4presentsouralgorithmstoensurecorrectnessofexecutioninafederationofmain-memorydatabaseswithdeferreddiskwrites.Wecom-pareourapproachwithexistentworksintheeldinSec-tion5andconcludethepaperinSection6.Duetospacelimitations,theoremsandcorrectnessproofsarepresentedinthefullpaper[25].2.SystemmodelWeassumeasystemcomposedoftwodisjointsetsofprocesses:thesetofserversS=fS1;S2;:::;SngandthesetofclientsC=fC1;C2;:::;Cmg.Serversarestateful—theirstateisgivenbythedatavaluesstoredonthem,andclientsarestateless—theirstatecanberecreatedbytheservers'stateincaseofcrash.Weas-sumethatclientsinteractonlywithserversbysubmit-tingtransactionrequestsandwaitingfortheirresponse.Allcommunicationbetweenclientsandserversisdonethroughmessageexchanging.Thesystemisasynchronous:wemakenoassump-tionsaboutthetimeneededforprocessestoexecuteandmessagestobetransmitted.2Communicationlinksmaylosemessagesbutifbothsenderandreceiverremainup“longenough,”lostmessagescanberetransmittedandareeventuallyreceived.Aprocesscanfailbycrashing,stoppingitsexecutionandlosingitsvolatilestate,but2Theimplementationofadistributedtransactionalenvironmentmayrequirestrongerassumptions(e.g.,failuresuspicion).Theideasdescribedinthispaper,however,areoblivioustosuchassumptions.2 iteventuallyrecovers.Serversareequippedwithsta-blestoragewhosecontentssurvivecrashes.Thesystemexecutionalternatesbetweennormalexecutionperiodsandrecoverysessions.Arecoverysessionstartswhenafailureisnoticedandendsaftertheserversareensuredtobeinaglobally-consistentstate.2.1.DatabaseserversandtransactionsServersstoredisjointsubsetsoftheentiredatabaseaccessibletotheclientsandrunlocalmain-memorydatabases.WecallthecompletesetofserversSamain-memorydatabasefederation.Eachserverexecuteslocaltransactions,whereatransactionisa(mostlikelyshort)sequenceofreadandwriteoperationsondataitems,fol-lowedbyacommitoranabortoperation,butnotboth.Atransactioniscalledread-onlyifitdoesnotcontainanywriteoperations,andupdateotherwise.Transactionsareabstractedbythefollowingtraditionalproperties[13]:Atomicity:Atransaction'schangestothedatabasestateareatomic:eitherallhappenornonehappen.Consistency:Atransactionisacorrecttransformationofthedatabasestate.Isolation:Anyexecutionofasetoftransactionsisequivalenttoaserialexecutionofthesametrans-actions.Durabilityisrelaxedasaresultofdeferreddiskwrites.Ifthereisafailurebeforeatransactionismadedurable,butafteritscommit,suchatransactionislost.Inthatcase,afterrecoverytheexecutionhastoproceedasifthetransactionhadneverexecuted.Losttransactionsdifferfromabortedonesbecausetheycommitandtheirresultsmayhavebeenseenbyothertransactions.Atransactionthatisnotlostthroughouttheexecutioniscalledpersis-tent.Weredenetransactiondurabilityunderdeferreddiskwritesthroughthetwopropertiesbelow:WeakDurability:Ifanupdatetransactioncommitsandthesystemdoesnotcrashfor“longenough,”thetransactionispersistent.ConsistentPersistence:Apersistenttransactionispre-cededonlybyotherpersistenttransactions.Inordertomakethepreviousdenitionssound,twothingsstillhavetobedened:equivalencebetweenex-ecutionsofsetsoftransactionsandprecedencebetweentransactions.LetatransactionhistoryHbeapartialorderonalltheoperationsexecutedbyasetoftrans-actions,necessarilydenedforallconictingopera-tions—twooperationsaresaidtoconictiftheybothoperateonthesamedataitemandoneofthemisawrite[4].Hrepresentsarealexecution(notnecessar-ilyserial)ofthetransactionsinthesystem.Twohisto-riesoverthesamesetoftransactionsareequivalentiftheyorderconictingoperationsofnon-abortedpersis-tenttransactionsinthesameway.Wesaythatatrans-actiont1directlyprecedesatransactiont2inHifthereisapairofconictingoperations,(o12t1;o22t2),suchthato1precedeso2inH.Theprecedencerelationbetweentransactionsisgivenbythetransitiveclosureofthedirectprecedencerelation.Havingclariedourde-nitions,wewouldliketoreinforcethatourconcernistoextendWeakDurabilityandConsistentPersistencefromthelocaldatabaseserverstothefederation,andensurethatnoneoftheothertransactionpropertiesareviolatedinthepresenceoffailures.Weassumetheconcurrencycontrolineachserverisbasedonsharedreadlocksandexclusivewritelocksinthewholelocaldatabase,characterizingthemultiple-readsingle-writebehaviorfoundinsomeMMDBs(e.g.,[15]).ThisallowsustoabstractclientoperationsasReadsandWritesperformedoveranentiredatabasestate.Weshowhowourapproachcanbeextendedtomorecomplicatedconcurrencycontrolmechanismssuchastwo-phase-lockinginSection4.5.AserverSiupdatesitsstatetoanewoneaftercom-mittingatransactionthatwrotesomevalueontheserver.Thiscreatesasequenceofstates0i;1i;:::,wherejirepresentsSi'sstateaftercommittingthej-thlocalup-datetransaction.2.2.Clients'executionmodelClientsexecuteasequenceofsteps.Ineachstep,aclient(a)performssomelocalcomputation,(b)submitsarequesttoadatabaseinthefederation,and(c)waitsforitsresponse.Weabstractthesetofpossibledatabasere-questsbythefollowingprimitives,whereoprepresentsanoperationtobesubmittedtothedatabase.DetailsabouttheirimplementationaregiveninSection4.2.Read(tid,Si,op):OperationopreadssomedataitemstoredinSionbehalfoftransactiontid.Write(tid,Si,op):OperationopupdatessomedataitemstoredinSiorcreatesitonbehalfoftid.Commit(tid):Requeststheglobalcommitoftransac-tiontidinthefederation.Abort(tid):Requeststheglobalabortoftransactiontidinthefederation.Tostartanewtransaction,aclientgeneratesanewuniquetransactionidenticationnumber(tid),tobeusedinallservers.Whenaserverreceivestherstoper-ationonbehalfoftid(eitheraReadoraWrite),itcre-atesanewtransactionabstractioninthelocaldatabaseandrelatesittotidinordertosubmitfutureoperationstothedatabaseinthesamelocaltransactionabstraction.Whenalltheoperationsinallserversreferenttoacertain3 transactionhavebeenexecuted,theclientexecutestheCommitrequesttoensureglobalcommit.AfteraCom-mitorAbortrequest,nomorerequestswiththesametidareexecutedbytheclient.Atanypointduringatransaction'sexecution,aserverthatisparticipatinginitcanunilaterallyabortitslocalsub-transaction.Thisisdone,forexample,ifthelocalsub-transactionisinvolvedinadeadlockortheserversuspectsthattheclientresponsibleforthistransactionhascrashed.Toensuretransactions'atomiccommitintheabsenceoffailuresweuseasimpleblockingpro-tocol:theclientsendsamessagetoallinvolvedserversaskingthemtopreparetocommit.Everyinvolvedserversendsitscommitting/abortingvotetotheclientandtheotherservers.Aservercommitsthetransactioniffitreceivesa“commit”votefromeveryinvolvedserver.Moreover,iftheclientreceivesthe“commit”votefromeveryserver,itknowsthetransactionhasbeencommit-ted.Toabortatransaction,aclientsimplysendsan“abort”messagetoallinvolvedservers.Iftheclientfailsandsomeserverdoesnotreceivesuchamessage,even-tuallythisserverwillunilaterallyabortthetransaction.Itisclearthatthisalgorithm(derivedfromtwo-phase-commit[4,13])worksintheabsenceoffailures.Sec-tion4.2showshowAtomicityispreservedinthepres-enceoffailuresalbeitnodiskwriteisexecutedduringtransactioncommit.3.ConsistentglobaldatabasestatesWhenafailureoccurs,wemustmakesurethatthesystemwillrestartfromapreviousconsistentglobalstate.Inthissectionwepreciselydenethenotionofconsistency,analyzetheconditionsthatmakeaglobaldatabasestateconsistent,andshowwhatmustbedonebyouralgorithmtohaveitrecoverable.3.1.Database­statedependenciesWhenitcomestothecreationofdatabase-statede-pendencies,weareonlyinterestedincommittedtrans-actions.Therefore,weconsideronlycommittedtrans-actionsindenitionsandtheoremspresentedinthissectionand,forsimplicity,omitthisconditionintheirstatement.Additionally,someextranotationisnec-essary.WeuseRW(t)torepresentthesetofserverstatesaccessedbytransactiontthroughoutitsexecution.W(t)RW(t)isthesetofserverstatesupdatedbyt.Thismeansthatif i2W(t)andtcommits,anewdatabasestate +1iiscreatedbytatserverSi.Fur-thermore,wedeneR(t)=RW(t)nW(t)tobethesetofserverstatesreadbyt.Statedependenciesinthetransactionalmodelareduetothethreewell-knowntypesoftransactiondependen-cies:write-read,write-writeandread-write[4,13].Def-inition1belowcapturesthenotionoftransactionde-pendencyusingourterminologyinasimpliedmanner,wherewrite-readandwrite-writedependenciesarerep-resentedbycondition(a),read-writedependenciesbycondition(b),andtransitivedependenciesbycondition(c).Inthiscontext,adatabasestateprecedesanotheroneiftheformerisoverwrittenbyatransactionthateithercreatesthelatterorprecedesthetransactionthatdoesit.Thismeansthattherststatewillhavealreadybeenoverwrittenbythetimethesecondoneiscreatedand,therefore,notransaction(orexternalviewer)canseebothofthemtogetherinthesameglobaldatabasestate.Denition2presentsthisideamoreformally.Denition1Transactiontprecedest0(t!t0)iff(a)9\rcj\r1c2W(t)^\rc2RW(t0);or(b)9\rcj\rc2R(t)^\rc2W(t0);or(c)9t00jt!t00^t00!t0.Denition2State aprecedes b( a! b)iff(a)9tj a2W(t);and(b)9t0j 1b2W(t0);and(c)t=t0_t!t0.3.2.ConsistentandrecoverabledatabasestatesAglobalstateofthefederationisasetcomposedofalocalstateforeachdatabaseserverinthesystem.Webaseourconsistencycriteriononthenotionofserializ-abilty[4]andformalizeitinDenition3.Denition3Aglobaldatabasestatef 11;:::; nnginagivenhistoryHisconsistentiffitrepresentsthedatabasestateaftertheserialexecutionofanorderedsetoftransactionsT=(t1;t2;::;tl)suchthat:(a)alltransactionsinTarenon-abortedpersistenttransactionsinH;(b)8t2T:t0!tinH)t02T;and(c)8ta;tb2T:ta!tbinH)ab.FromDenition3,aglobalstateisconsistentifitiscreatedbytheexecution,inacorrectorder,ofasubsetoftheexecutedtransactionsleft-closedunderthetrans-actiondependencyrelation.Theorem1showsasim-plercharacterizationofaconsistentglobalstatebasedonthedatabase-statedependencyrelationweintroducedinDenition2.Theorem1AglobalstateG=f 11;:::; nngiscon-sistentiff8 ii; jj2G: ii6! jj.4 Asanexample,considerFigure2(a),whereweshowapossibleexecutionscenarioinwhichvetransactionsareappliedtoafederationoftwodatabaseservers.Weomitmessageexchangesbetweenclientsandserversanddepictonlytheoperationsperformedagainstthedatabasesgroupedbytransaction,whereWmeansadatabasewriteandRmeansadatabaseread.Figure2(b)showsthedependenciesbetweenthedatabasestatescre-atedbytheexecutedtransactions.Wedepictonlythedirectdependenciesandomitthetransitiveones.Basedonthesedependencies,itispossibletoidentifyatotalofsevenconsistentglobalstatesaccordingtoTheorem1,allofthemdepictedinFigure2(b).Globalstatenumber4isreachedaftertheserialexecutionof(t1;t2;t3)andglobalstatenumber6isachievedbyT=(t1;t2;t3;t4).BytheWeakDurabilitypropertydescribedinSec-tion2.1,ifoneservercrashes,itmightnotrecoverinthesamestateitwasjustbeforethecrash.AccordingtoConsistentPersistence,locallyensuredbytheMMDBrunningintheserver,anentiresufxofthelocalexecu-tionmaybelostafterafailure.Asthisnewlocalstatemaybeinconsistentwiththestateoftheotherservers,toensureConsistentPersistencegloballytheentiresys-temmayhavetorollbacktoapreviousconsistentglobalstate.Clearly,wewantthisstatetobeasrecentaspos-sibletorollbacktheleastnumberofcommittedtransac-tions.Inordertosatisfythisconditionwehavetodis-tinguishbetweenstabledatabasestates,alreadywrittenontheserver'sdisk,andvolatiledatabasestates,whoselocaldurabilityhasnotbeenensuredyet.Aconsistentglobalstateisrecoverableifitiscomposedofstabledatabasestates.Whensomedatabaseserverscrash,therecoveryalgorithmmustmakethesystemrollbacktoitsmostrecentrecoverableconsistentglobalstate,orre-coveryline.Anon-faultyserverthatwantstomakeitsvolatilestatespartoftherecoverylineshouldmakethemstablebeforeexecutingtherecoveryalgorithm.   \n\n\nPSfragreplacementsS1S2RRWWWWWWt1t2t3t4t501112131021222321234567(a)PSfragreplacementsS1S2RRWWt1t2t3t4t50111213102122232123456702122232(b)Figure2.Consistentglobalstates \r \rPSfragreplacementsS1S20102[32]R11=last1[21][31]1222=last2RFigure3.Recovery­linedeterminationThemaindetermineroftherecoverylineinsomehis-toryHisthelaststablestateofeachserverSi,whichwedenotebylasti.AsTheorem2shows,therecoverylineforagivenexecutionscenarioiscomposedofthelastpersistentstatenotprecededbyanystatelasti.Theorem2TherecoverylineRforagivenhistoryisdeterminedbyR=n[i=1fkijk=max(\rj8Sj:lastj6!\ri)gFigure3depictsanexampleofrecoverylinedetermina-tionbasedonthescenariopresentedinFigure2(volatilestatesaredepictedbetweensquarebrackets,e.g.,[ji]).Thegureshowsadependencygraphwithallthestatesdependentonsomestatelastiasemptycircles.There-fore,therecoverylineisformedbythestaterepresentedbythelastlledcircleineachdatabaseserver.4.Database-orientedrollback-recovery4.1.ThriftydependencytrackingDenition2relatesdatabase-statedependencieswithtransactiondependencies.Theorem3belowshowsthatitisalsopossibletokeeptrackofdatabase-statede-pendencieswithouthavingtogatherinformationabouttransactiondependencies.Theorem3Serverstate aprecedes b( a! b)iff(a)9tj a2W(t)^ 1b2RW(t);or(b)9t;\rcj a!\rc^ 1b;\rc2RW(t);or(c)9t;\rcj a!\rc^ 1b2RW(t)^\r1c2W(t):Theorem3comesfromthefactthatatransactiontac-cessesaconsistentpartialstateofthefederationandgenerates,afteritsexecution,anotherconsistentpartialstate.Thesestatesworklikepartialsnapshotsoftheex-ecutionand,therefore,incurconstraintsintheorderingofevents.Asintherealworld,ifaneventiscapturedinasnapshotandanotheroneisnot(i.e.,ittookplaceafterthesnapshotwastaken),thenthesnapshotisa“proof”thatthersteventhappenedbeforethesecond.5 Weexemplifyconditions(a),(b)and(c)ofTheo-rem3inFigure4,whereSBeforereferstothe(partial)federationstateaccessedbytransactiont,eitheraread-onlyorupdatetransaction,andSAfterreferstothefed-erationstategeneratedaftert'sexecution.Inthegure,scenarios(a1)and(a2)correspondtocondition(a)ofTheorem3,andscenarios(b)and(c)correspondtocon-ditions(b)and(c),respectively.Figure4(a1)depictsthesituationwhere a2W(t)and 1b2R(t).Whentcommits,thenewstateitcreatescontains +1aand 1b.Therefore,as anecessarilyprecedesthisstateand bsucceedsit,itisclearthat a! b.Fig-ure4(a2)representsthecasewhere a; 1b2W(t).As biscreatedbyt,itdidnotexistbeforet'scommit;whilst aexistedonlyuntilbeforetcommits,sinceitisupdatedbyt.Thismeansthat,asnoothertransactioncanseeastatebetweenSBeforeandSAfter, a! b.InFigure4(b), a!\rcand 1b;\rc2RW(t).Thismeansthat 1band\rcbelongtothefederationstateaccessedbyt.SimilarlytothesituationdepictedinFig-ure4(a1), amustprecede b.Lastly,letusconsiderthecasewhere a!\rcand 1b;\r1c2W(t),showninFigure4(c).Thestategeneratedaftert'scom-mitcontains band\rc.Since aprecedes\rc, ahasbeenalreadyupdatedbefore\rciscreated.As\rcand barecreatedtogether,surely a! b.Thescenarioofcondition(c)where a!\rc, 1b2R(t)and\r1c2W(t)resemblesthesituationdepictedinFig-ure4(b),justexchangingSBeforeforSAfter.4.2.DependencytrackingalgorithmTheorem3leadstoasimplewaytogatherdatabase-statedependencieson-the-yduringthesystem'sexe-PSfragreplacements a +1a 1b b\rcSAfterSBeforePSfragreplacements a +1a 1b b\rcSAfterSBefore(a1)(a2)PSfragreplacements a +1a 1b b\rcSAfterSBeforePSfragreplacements a +1a 1b b\rcSAfterSBefore\rc(b)(c)Figure4.Dependenciesbasedontheserverstatesaccessedbyatransactioncution.Assumeeachstate ihasassociatedwithitadatastructureD( i)representingthesetofstatesitde-pendson(weshowlaterhowthisstructurecanbeim-plementedefciently).ToupdateD( i),uponcommit-ting,everytransactiontexecutesthestepsdescribedinAlgorithm1,whereD(Si)isanauxiliarydatastructurelocaltoSi,initiallyempty.D(Si)representsthedepen-denciesthatmustbeattributedtothenextstatetobecreatedatserverSi.Lines1–3aredirectlyassociatedwiththethreepossibledatabase-stateprecedencespre-sentedinTheorem3.Line4associatesadependencydatastructurewitheverynewdatabasestatecreatedbythetransaction.Algorithm1DependencytrackingDuringcommitoftransactiontatSi1:8 1b2RW(t):D(Sb) D(Sb)[W(t)2:8 1b;\rc2RW(t):D(Sb) D(Sb)[D(\rc)3:8 1b2RW(t):D(Sb) D(Sb)[S\rc2W(t)D(Sc)4:8j2W(t):D(+1j) D(Sj)WenowexplainhowAlgorithm1canbeimple-mentedinpractice.WestartanalyzinghowMMDBswritedatabasestatechangesonstablestorage.InMMDBs,datachangesarestoredondiskonlyafteranupdatetransactionhasissuedacommitrequest.Thismeansthatnoactionmustbeundoneincaseoffailuresandthetransactionlogistypicallyredo-only,andcanbeimplementedbysimplystoringthesetofoperationsperformedbyeachtransaction[8].Regardlessitspar-ticularimplementationdetails,eachentryinaredo-onlylogrepresentsthenewstatecreatedbytherespectiveup-datetransactionexecuted.Wecanthereforeassociatethedatabasestate\rcwiththe\rthentryinthelogofServerSc.Tokeeptrackof\rc'sdependencies,theonlythingwehavetodoistowritethestructureD(\rc)withitsrespectivetransaction'sentryonSc'stransactionlog.Forapracticalimplementation,wemustprovideawaytoimplementthedatastructureD(\rc)efcientlywithrespecttospacecomplexity.Asdependenciesaretransitiveandcontinuousinthesequenceofstatesofadatabaseserver,itisnotdifculttoseethattokeeptrackofthecompletesetofdependenciesofagivenstate\rc,weneedtostoreonlythelaststateofeachserveronwhich\rcdepends.If\rcdependson a( �0),clearlyitalsodependson0a;:::; 1a.Therefore,acompletesetofstatedependenciescanberepresentedbyadependencyvectorDVwithnentries,inwhichDV[i]storestheindexofthemostrecentstatedepen-dencyfromserverSi.Thisideaandnomenclatureis6 inspiredbydependencytrackingforrollback-recoveryinthemessage-passingmodel[28].Wedivideourdependencytrackingalgorithmintotwoparts:theclientstubandtheserverwrapper,bothshowninAlgorithm2.Onlyonewhenclauseexecutesatatime,andonlyafteritsconditionholds.Ifmorethanonewhen-clause'sconditionholdatthesametime,anyoneischosentoexecute.Weassumehoweverthattheexecutionisfair,thatis,unlesstheservercrashes,everywhenclausewithaconditionthatholdswillbeexecuted.TosubmittransactionoperationstothelocalMMDB,aservermakesuseofthesubmitinterface.Moreover,tomakeitclearthatourapproachdoesnotintroduceanyextradiskoperations,alllogoperationsaredealtbyouralgorithm,thatis,allsubmitcallsaccessonlydataintheserver'smainmemory.Attheclientsideitisonlynecessarytokeeptrackofthesetofserversaccessedduringtheexecutionofatransaction(line2).3Basically,alloperationsperformedbytheclientstubarestraightforwardandhavelittletodowithdependencytracking.Dependencytrackingtakesplaceatcommitmakinguseofthesynchroniza-tionmessagesexchangedbytheserverstoensuretrans-actions'atomicity.Whileanalyzingthealgorithm,re-memberthatweassumeIsolationisensuredbyasim-pledatabase-lockingmechanismandglobalAtomicityduringnormalexecutionisgivenbyavariationoftwo-phase-commit,describedinSections2.1and2.2,respec-tively.Althoughwemakenoexplicituseofthesetwoproperties,theyensurethedependenciescapturedbyouralgorithmareconsistentwiththedependenciesindeedcreatedinthedistributeddatabase.Briey,eachserverkeepstwodependencyvectorsduringexecution,DVandDVlast.DVimplementsD(Si)(thedependenciestobeattributedtothenextstatecreated)andDVlaststoresthedependenciesofthecur-rentdatabasestate.Aserversends,togetherwiththean-swertothePREPARErequestissuedbytheclient,ade-pendencyvectorcontainingthedependenciesthetrans-actionshouldforwardtoallaccessedserversbasedontheoperationsperformedinthelocaldatabase(lines35-41).Thisinformationissentnotonlytotheclientbutalsototheotherinvolvedservers.Finally,whenaserverSireceivesthemessagesfromallserversinvolvedinthetransaction,itupdatesitsDV(line45-46)and,ifthetransactionwrotesomedatainthedatabase,theserverperformsalocalstatetransition(lines48-49).AcorrectimplementationofAlgorithm1isensuredbythedependenciespropagatedbytheserversinthe3Forcodesimplicity,letusassumeasingleclientdoesnotexecutetwotransactionsconcurrently.VOTEmessages.Dependenciesreferenttoline1ofAl-gorithm1aregatheredinline38ofAlgorithm2.De-pendenciesgivenbyline2ofAlgorithm1aregatheredinline39ofAlgorithm2iftheserverwasonlyreadbythetransaction,orinline37iftheserverwasupdated.Line37alsocapturesdependenciesreferenttoline3ofAlgorithm1.CorrectnessproofsofAlgorithms1and2appearin[25].Asmentionedbefore,theatomiccommitmechanismweassumedcanblockprocessesincaseoffailure,forc-ingthemtowaitforamessagefromaprocessthathascrashed.Ablockedprocessisunblockedwhenthecrashedserveruponwhichitdependsrecoversandstartstheglobalrecoveryprocedureexplainedinthenextsec-tion.Duringtherecoveryphase,allrunningtransactionsareabortedandglobalstateconsistencyisensuredbytherollback-recoverymechanism.Whenexecutionre-sumes,noserverisblockedanymore.Ablockedclienthastowaitforarecoverynoticationtounblockandcheckwiththedatabaseserverswhethersometransac-tionwaslost.Unblockedclientsmayalsostartsomerecoveryprocedureafterreceivingsuchanoticationiftheyrelyonsomethingoutsidethedatabasetoensuretransactiondurability.4.3.Rollback­recoveryOncewehavemanagedtoperformdependencytrackingefcientlyduringtheexecution,wecanmakeuseofoneofthenumerousexistentapproachestoorchestraterollback-recoveryinthemessage-passingmodel[14,26,28].Weillustratetheideabyextend-ingthealgorithmpresentedin[26],adaptedtoourexe-cutionmodel.Thesystemrunsasasequenceofincar-nations,startedafterrecoveryfromsomefailure.Eachserverkeepstrackofthecurrentincarnation.Inordertostartanewone,anagreementamongserversmustbereachedtodeterminetherecoverylineusedforthefed-erationrestart.Therefore,processesexchangemessagescontaininginformationabouttheirlaststabledatabasestate.WhenallinformationisreceivedbyaserverSi,itcomputesitslocalstatethattakespartintherecoverylinebasedonTheorem2androllsbacktoitbyerasinginconsistentlogentries.Duetothepossibilityoffail-ures,informationaboutthecurrentincarnationandthelastrecoverylineusedforrecoverymustbekeptinthestablestorageofeachserver.Adetaileddescriptionofthisalgorithmispresentedin[25].4.4.AlgorithmanalysisAlgorithm2incursnoextracostduringtransactionexecutionwithrespecttothenumberofmessagesand7 Algorithm2CompletealgorithmfordependencytrackingCLIENTSTUB1:DataStructures2::setofservers3:Begin_Transaction()4: ;5:returnuniquetid6:Read/Write(tid;Si;op)7: [fSig8:sendhREAD/WRITE,tid,opitoSi9:waitforhresultifromSi10:returnresult11:Commit(tid)12:sendhPREPARE,tiditoallSi213:waitforhVOTE,tid,vi,DViifromallSi214:return(8Si2:vi=YES)15:Abort(tid)16:sendhABORT,tiditoallSi2SERVERWRAPPER17:DataStructures18:opSettid:orderedsetofoperations19:DV;DVlast:array[1::n]ofinteger20:tid:setofservers21:Initialization22:8tid:opSettid ;;tid S23:81jn:DV[j] 124:DVlast DV25:Theservercontinuouslywaitsforanevent:26:whenreceivehREAD,tid,opifromCi27:result submit(tid,op)28:sendhresultitoCi29:whenreceivehWRITE,tid,opifromCi30:result submit(tid,op)31:appendoptoopSettid32:sendhresultitoCi33:whenreceivehPREPARE,tid,iifromCi34:tid i35:ifwillingtocommitthen36:ifopSettid6=;then37:DVaux DV38:DVaux[i] DVlast[i]+139:elseDVaux DVlast40:sendhVOTE,tid,YES,DVauxitoCi[tid41:elsesendhVOTE,tid,NO,?itoCi[tid42:when9tidsuchthat8Si2tid:receivedhVOTE,tid,vtidi,DVtidifromSi43:if8Si2tid:vtidi=YESthen44:submit(tid,COMMIT)45:forallSi2tiddo46:8j:DV[j] max(DV[j];DVtid[j])47:ifopSettid6=;then48:DVlast DV49:assynchronouslywriteentryhopSettid,DViinthetransactionlog50:whenreceivehABORT,tidifromCi51:submit(tid,ABORT)communicationsteps.Thealgorithmjustpiggybacksavectortimestampinmessagesrelatedtothetransac-tioncommitandupdateslocalvariablesaccordingtothetimestampsreceived.Ourapproachensuresthemin-imumpossible“windowofvulnerability”fortransac-tions,sinceitdependsonlyonthetimeeachservertakestophysicallywriteonstablestoragethetransaction'slogentry.Everyserverdoesthatatitsownpacewithoutsynchronizingwiththeothers;assoonasallofthemcompletetheirwritesthetransactionisdurable.Itispossibletocomeupwithalternativesolutionstotheproblemofensuringconsistencyinafederationofmain-memorydatabasesunderdeferreddiskwrites.Forinstance,non-blockingsynchronouscheckpointingapproachesforthemessage-passingmodel,like[6]and[16]canbeadaptedtothetransactionalmodelconsideringdatabase-statedependenciesinthewaywehavedened.Thesealgorithms,however,incurO(n2)controlmessagesduringdisk-writesynchronizationandmayforcethepropagationoftimestampsintheappli-cationmessagestoovercometheabsenceofFIFOcom-municationchannels[9]ortwodiskwritespersynchro-nizationtorecordthefactthatthecurrentinstancehasnishedandnewonesareallowed[16].Althoughsomedifcultiescanbeavoidedbystrongersystemassump-tionsasin[24],theproblemofincreasingthewindowofvulnerabilityandmakingitaslargeastheoneoftheslowestserverforallserverswillalwaysbepresentinsynchronousalgorithms.Table1summarizesthecomparisonbetweentheapproacheswehavementioned.Weaggregatesyn-chronouscheckpointingprotocols(e.g.,[6]and[16])sincetheypresentasimilarbehaviorwithrespecttothevariablesanalyzedinthetable.Moreover,“MySQLCluster”referstothesynchronousapproachadoptedin[24].Werepresentthedisklatency(i.e.,thetimeittakesforadiskwriterequesttobecompleted)ofserverSibydlat(Si);anduseMAXtorefertomax(fdlat(Si)jSi2Sg).Thenetworklatency,usedtoquantifyacommunicationstep,isrepresentedby.BesidesrequiringFIFOchannels,synchronouscheckpointingprotocolsincludetheclientsintheirsyn-chronization,sincetheyareinvolvedinthecreationofdatabase-statedependencies.MySQLClusterassumes8 CommunicationClientSi'swindowofExtramessagesAlgorithmchannelssynchronizationvulnerabilityperexecutionSync.CheckpointingFIFOClientsparticipateMAX+2\n(n2)MySQLClusterPartiallySync.ClientscoordinateMAX+3\n(n)OurapproachAnyNonedlat(Si)0Table1.Comparisonofthedifferentapproachespartiallysynchronouschannels(i.e.,withboundedmes-sagedelivery)andhaveclientscoordinatethetaskinor-dertosimplifythealgorithm.Differently,ourapproachmakesnoassumptionsaboutcommunicationchannelsandonlypropagatestimestampsonsomeofthemes-sagesalreadyexchangedbythesystem.Astheroleoftheclientinparticipatingofsynchronousapproachesisnotveryclear,possiblyforcingmoremessagestobeex-changed,forsuchapproachesweonlyshowthelowerboundonextramessagesrequiredforservers'synchro-nization..DealingwithcomplexconcurrencycontrolSofar,wehaveassumedaverysimpleconcurrencycontrolmechanisminsideeverysingledatabaseserver,withconcurrentaccessforread-onlytransactionsandexclusiveaccessforupdatetransactions.Howeverourresultscanbeeasilyextendedtomoregeneralcases.Forexample,thewell-knowntwo-phase-locking(2PL)algorithmcanbeseenasanextensionofoursimplecon-currencycontrolwhereeachpieceofdataplaystheroleofa"virtualdatabase":multipletransactionscanreadthedataconcurrentlybutonlyonecanupdateitatatime.Asaconsequence,though,vectortimestampswillhaveasmanyentriesasthenumberofvirtualdatabases.4Clearlytheimplementationofsuchasystemcanbesim-pliedsinceallvirtualdatabasesinsidethesamephys-icalonewillbealwayssynchronizedwitheachother.Reducingthesizeofthetimestampswillinvolveeithertheuseofdirectinsteadoftransitivedependencytrack-ing(andamorecomplexrecoveryalgorithm[9,26]),ortheidenticationoffalsedependenciesasithappenswhenlogicalclocksareusedinsteadofvectorclockstogathercausaldependenciesbetweenevents[17].Study-ingsuchalternativesisoutofthescopeofthispaper,andsubjecttofurtherwork.5RelatedworkAlthoughMMDBsdonotrepresentanewconceptindatabasedesign,onlyrecentlytheyhavebeenappliedto4AlthoughinpracticethismightnotincurinlargeoverheadssinceinmostMMDBsconcurrencycontrolisusuallyperformedatacoarsegranularity[11].moregeneralscenarios.Specically,toourknowledge,theonlyworkthatmakesuseofMMDBsinaclusterofserversis[24](derivedfrom[23]),whereperformanceandavailabilityareenhancedbyreplicatingandfrag-mentingthedatabaseamongthedatabaseserversinthesystem.Toensuregoodperformanceforupdatetrans-actionsaswell,theapproachmakesuseofdeferreddiskwrites,evenfortransactionsthataccessmultipleservers.Inthiscase,consistencyisensuredbysynchronizingtheservers'diskwritesasmentionedintheprevioussection.Rollback-recoveryhasbeenextensivelystudiedinthemessage-passingmodel[1,6,9,14,16,26,28].Nev-ertheless,veryfewoftheseworkshavebeenexploitedindifferentenvironments.Theworkin[2]presentsaframeworktoanalyzeconsistencyindifferentshared-memoryandmessage-passingsystems.In[3],theirresultsareextendedtothetransactionalmodel,moti-vatedbytheproblemofbuildingaconsistentsnapshotofacentralizeddatabasewithoutstoppingtheexecu-tionoftransactions.Actually,theproblemofbuild-ingaconsistentdatabasesnapshothastriggeredalotofresearchontheanalysisofdatabase-statedependen-cies[3,10,20,21,27].Differentapproacheshavecon-sidereddependenciescreatedbetweentransactionsduetoconcurrencycontrol[5]orbetweendataaccessedwithinasingleprocesswhichshouldbeconsistentlytransferedtostablestorage[7].Someoftheideaspre-sentedintheseworks,speciallyin[3]and[10],resembleourtransactionandstatedependenciesdenitions.How-ever,noneofthempresentapracticalcharacterizationofdatabase-statedependencies(e.g.,Theorem3).Ourap-proachdiffersfromtheseworksby(a)assumingadis-tributedscenariowheresynchronizationbetweendiffer-entprocessesmustbeminimized,and(b)aimingatap-plyingrollback-recoverytechniquestobringtheappli-cationbacktoaconsistentstateincaseoffailure.[7,5]6.ConcludingremarksInthispaperwetackledtheproblemofdeferreddiskwritesinfederationsofmain-memorydatabasesystems.Ourapproachwasmotivatedbypreviousresearchonrollback-recoveryformessage-passingdistributedsys-tems.Wedescribedhowdatabase-statedependenciesarecreatedinthetransactionalmodelandhowtheycan9 betrackedefcientlyduringexecution.Apossibleex-tensiontoouralgorithmsistousedirectinsteadoftran-sitivedependencytracking[9,26],asthiscanpossiblyleadtosmallertimestampsiftransactionsdonottendtoaccessmanyservers.Moreover,ouralgorithmsborrowfromoptimisticmessagelogging.Itisalsopossibletoexploitotherrollback-recoverytechniques,likecausalmessageloggingandquasi-synchronouscheckpointing,andcomparetheirperformanceandadvantagesunderdifferenttransactionscenarios.Researchdomainsthatmaytakeadvantageofthistheoryincludeoptimisticconcurrencycontrolmechanismsandmanagementofnestedtransactions.Investigatingsuchissuesisthesub-jectoffuturework.AcknowledgmentsWethanktheanonymousreviewersfortheircom-mentsthathelpedusimprovethepaper.References[1]L.AlvisiandK.Marzullo.MessageLogging:Pes-simistic,Optimistic,CausalandOptimal.IEEETrans.onSoftwareEngineering,24(2):149–159,Feb.1998.[2]R.Baldoni,J.-M.Helari,andM.Raynal.Consistentrecordsinasynchronouscomputations.ActaInformat-ica,35(6):441–455,June1998.[3]R.Baldoni,F.Quaglia,andM.Raynal.Consistentcheck-pointingfortransactionsystems.TheComputerJournal,44(2):92–100,2001.[4]P.Bernstein,V.Hadzilacos,andN.Goodman.Con-currencyControlandRecoveryinDatabasesSystems.Addison-Wesley,1987.[5]B.Bhargava.Concurrencycontrolindatabasesystems.IEEETransactionsonKnowledgeandDataEngineer-ing,11(1):3–16,1999.[6]M.ChandyandL.Lamport.DistributedSnapshots:De-terminingGlobalStatesofDistributedSystems.ACMTrans.onComputerSystems,3(1):63–75,Feb.1985.[7]F.Cristian,S.Mishra,andY.S.Hyun.Implementationandperformanceofastable-storageserviceinUnix.InProceedingsofthe15thIEEESymposiumonReliableDistributedSystems,1996.[8]D.DeWitt,R.H.Katz,F.Olken,L.D.Shapiro,M.Stonebraker,andD.A.Wood.Implementationtech-niquesformainmemorydatabasesystems.InSIG-MOD'84,ProceedingsofAnnualMeeting,Boston,Mas-sachusetts,June18-21,pages1–8.ACMPress,1984.[9]E.N.Elnozahy,L.Alvisi,Y.M.Wang,andD.B.Johnson.ASurveyofRollback-RecoveryProtocolsinMessage-PassingSystems.ACMComputingSurveys,34(3):375–408,Sept.2002.[10]I.C.GarciaandL.E.Buzato.Asynchronousconstruc-tionofconsistentglobalsnapshotsintheobjectandac-tionmodel.InProc.ofthe4thIEEEInt.ConferenceonCon®gurableDistributedSystems,1998.[11]H.Garcia-MolinaandK.Salem.Mainmemorydatabasesystems:Anoverview.IEEETransactionsonKnowledgeandDataEngineering,4(6):509–516,Dec.1992.[12]J.Gray.Therevolutionindatabasearchitecture.Techni-calReportMSR-TR-2004-31,MicrosoftResearch,2004.[13]J.N.GrayandA.Reuter.TransactionProcessing:Con-ceptsandTechniques.MorganKaufmann,1993.[14]D.B.JohnsonandW.Zwaenepoel.Recoveryindis-tributedsystemsusingoptimisticmessageloggingandcheckpointing.JournalofAlgorithms,11(3):462–491,1990.[15]K.Knizhnik.Fastdb:Main-memoryrelationaldatabasemanagementsystem.http://www.garret.ru/knizh-nik/fastdb.html.[16]R.KooandS.Toueg.Checkpointingandrollback-recoveryfordistributedsystems.IEEETrans.onSoft-wareEngineering,13:23–31,Jan.1987.[17]L.Lamport.Time,clocks,andtheorderingofeventsinadistributedsystem.Commun.ACM,21(7):558–565,July1978.[18]D.Morse.In-memorydatabasewebserver.DedicatedSystemsMagazine,4:12–14,2000.[19]F.Pedone,R.Guerraoui,andA.Schiper.Thedatabasestatemachineapproach.JournalofDistributedandPar-allelDatabasesandTechnology,14(1):71–98,2003.[20]S.PilarskiandT.Kameda.Checkpointingfordistributeddatabases:Startingfromthebasics.IEEETrans.onPar-allelandDistributedSystems,3(5):602–610,1992.[21]C.Pu.On-the-¯y,incremental,consistentreadingofen-tiredatabases.Algorithmica,1(3):271–287,1986.[22]L.RodriguesandM.Raynal.Atomicbroadcastinasynchronouscrash-recoverydistributedsystemsanditsuseinquorum-basedreplication.IEEETransactionsonKnowledgeandDataEngineering,15(5):1206–1217,2003.[23]M.Ronström.TheNDBcluster–Aparalleldataserverfortelecommunicationsapplications.EricssonReviewno.4,1997.[24]M.RonströmandL.Thalmann.Mysqlclusterarchitec-tureoverview.MySQLTechnicalWhitePaper,2004.[25]R.SchmidtandF.Pedone.Consistentmain-memorydatabasefederationsunderdeferreddiskwrites.Tech-nicalReportIC/2005/17,SchoolofComputerandCom-municaitonSciences,EPFL,2005.[26]A.P.SistlaandJ.L.Welch.Ef®cientdistributedrecoveryusingmessagelogging.InProceedingsofthe8thACMSymposiumonthePrinciplesofDistributedComputing,pages233–238,1989.[27]S.H.SonandA.K.Agrawala.Distributedcheckpointingforgloballyconsistentstatesofdatabases.IEEETrans.onSoftwareEngineering,15(19):1157–1166,1989.[28]R.StromandS.Yemini.OptimisticRecoveryinDis-tributedSystems.ACMTrans.onComputingSystems,3(3):204–226,Aug.1985.10