/
Coordination Avoidance in Database Systems Peter Bailis Alan Fekete  Michael J Coordination Avoidance in Database Systems Peter Bailis Alan Fekete  Michael J

Coordination Avoidance in Database Systems Peter Bailis Alan Fekete Michael J - PDF document

karlyn-bohler
karlyn-bohler . @karlyn-bohler
Follow
547 views
Uploaded On 2015-01-14

Coordination Avoidance in Database Systems Peter Bailis Alan Fekete Michael J - PPT Presentation

Franklin Ali Ghodsi Joseph M Hellerstein Ion Stoica UC Berkeley and University of Sydney ABSTRACT Minimizing coordination or blocking communication between con currently executing operations is key to maximizing scalability availability and high per ID: 31184

Franklin Ali Ghodsi Joseph

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Coordination Avoidance in Database Syste..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

186 194 195 193 190 191 196 192 185 189 187 188 CoordinationAvoidanceinDatabaseSystemsPeterBailis,AlanFekete†,MichaelJ.Franklin,AliGhodsi,JosephM.Hellerstein,IonStoicaUCBerkeleyand†UniversityofSydneyABSTRACTMinimizingcoordination,orblockingcommunicationbetweencon-currentlyexecutingoperations,iskeytomaximizingscalability,availability,andhighperformanceindatabasesystems.However,uninhibitedcoordination-freeexecutioncancompromiseapplica-tioncorrectness,orconsistency.Wheniscoordinationnecessaryforcorrectness?Theclassicuseofserializabletransactionsissufcienttomaintaincorrectnessbutisnotnecessaryforallapplications,sacricingpotentialscalability.Inthispaper,wedevelopaformalframework,invariantconuence,thatdetermineswhetheranappli-cationrequirescoordinationforcorrectexecution.Byoperatingonapplication-levelinvariantsoverdatabasestates(e.g.,integrityconstraints),invariantconuenceanalysisprovidesanecessaryandsufcientconditionforsafe,coordination-freeexecution.Whenprogrammersspecifytheirapplicationinvariants,thisanalysisal-lowsdatabasestocoordinateonlywhenanomaliesthatmightviolateinvariantsarepossible.Weanalyzetheinvariantconuenceofcom-moninvariantsandoperationsfromreal-worlddatabasesystems(i.e.,integrityconstraints)andapplicationsandshowthatmanyareinvariantconuentandthereforeachievablewithoutcoordination.Weapplytheseresultstoaproof-of-conceptcoordination-avoidingdatabaseprototypeanddemonstratesizableperformancegainscom-paredtoserializableexecution,notablya25-foldimprovementoverpriorTPC-CNew-Orderperformanceona200servercluster.1.INTRODUCTIONMinimizingcoordinationiskeyinhigh-performance,scalabledatabasedesign.Coordination—informally,therequirementthatconcurrentlyexecutingoperationssynchronouslycommunicateorotherwisestallinordertocomplete—isexpensive:itlimitscon-currencybetweenoperationsandunderminestheeffectivenessofscale-outacrossservers.Inthepresenceofpartialsystemfail-ures,coordinatingoperationsmaybeforcedtostallindenitely,and,inthefailure-freecase,communicationdelayscanincreaselatency[9,28].Incontrast,coordination-freeoperationsallowag-gressivescale-out,availability[28],andlowlatencyexecution[1].Ifoperationsarecoordination-free,thenaddingmorecapacity(e.g.,servers,processors)willresultinadditionalthroughput;operationscanexecuteonthenewresourceswithoutaffectingtheoldsetofresources.Partialfailureswillnotaffectnon-failedoperations,andlatencybetweenanydatabasereplicascanbehiddenfromend-users.Unfortunately,coordination-freeexecutionisnotalwayssafe.Un-inhibitedcoordination-freeexecutioncancompromiseapplication-ThisworkislicensedundertheCreativeCommonsAttribution­NonCommercial­NoDerivs3.0UnportedLicense.Toviewacopyofthisli­cense,visithttp://creativecommons.org/licenses/by­nc­nd/3.0/.Obtainper­missionpriortoanyusebeyondthosecoveredbythelicense.Contactcopyrightholderbyemailinginfo@vldb.org.Articlesfromthisvolumewereinvitedtopresenttheirresultsatthe41stInternationalConferenceonVeryLargeDataBases,Aug.31st­Sept.4th,2015,KohalaCoast,Hawaii.ProceedingsoftheVLDBEndowment,Vol.8,No.3Copyright2014VLDBEndowment2150­8097/14/11.levelcorrectness,orconsistency.1Incanonicalbankingapplica-tionexamples,concurrent,coordination-freewithdrawaloperationscanresultinundesirableand“inconsistent”outcomeslikenegativeaccountbalances—application-levelanomaliesthatthedatabaseshouldprevent.Toensurecorrectbehavior,adatabasesystemmustcoordinatetheexecutionoftheseoperationsthat,ifotherwiseexe-cutedconcurrently,couldresultininconsistentapplicationstate.Thistensionbetweencoordinationandcorrectnessisevidencedbytherangeofdatabaseconcurrencycontrolpolicies.Intradi-tionaldatabasesystems,serializableisolationprovidesconcurrentoperations(transactions)withtheillusionofexecutinginsomese-rialorder[15].Aslongasindividualtransactionsmaintaincorrectapplicationstate,serializabilityguaranteescorrectness[30].How-ever,eachpairofconcurrentoperations(atleastoneofwhichisawrite)canpotentiallycompromiseserializabilityandthereforewillrequirecoordinationtoexecute[9,21].Byisolatingusersatthelevelofreadsandwrites,serializabilitycanbeoverlyconser-vativeandmayinturncoordinatemorethanisstrictlynecessaryforconsistency[29,39,53,58].Forexample,hundredsofuserscansafelyandsimultaneouslyretweetBarackObamaonTwitterwithoutobservingaserialorderingofupdatestotheretweetcounter.Incontrast,arangeofwidely-deployedweakermodelsrequirelesscoordinationtoexecutebutsurfacereadandwritebehaviorthatmayinturncompromiseconsistency[2,9,22,48].Withthesealternativemodels,itisuptouserstodecidewhenweakenedguaranteesareacceptablefortheirapplications[6],leadingtoconfusionregarding(andsubstantialinterestin)therelationshipbetweenconsistency,scalability,andavailability[1,9,12,18,21,22,28,40].Inthispaper,weaddressthecentralquestioninherentinthistrade-off:wheniscoordinationstrictlynecessarytomaintainapplication-levelconsistency?Todoso,weenlisttheaidofapplicationpro-grammerstospecifytheircorrectnesscriteriaintheformofinvari-ants.Forexample,ourbankingapplicationwriterwouldspecifythataccountbalancesshouldbepositive(e.g.,byschemaannota-tions),similartoconstraintsinmoderndatabasestoday.Usingtheseinvariants,weformalizeanecessaryandsufcientconditionforinvariant-preservingandcoordination-freeexecutionofanappli-cation'soperations—therstsuchconditionwehaveencountered.Thisproperty—invariantconuence(I-conuence)—capturesthepotentialscalabilityandavailabilityofanapplication,independentofanyparticulardatabaseimplementation:ifanapplication'sopera-tionsareI-conuent,adatabasecancorrectlyexecutethemwithoutcoordination.IfoperationsarenotI-conuent,coordinationisrequiredtoguaranteecorrectness.Thisprovidesabasisforcoordi-nationavoidance:theuseofcoordinationonlywhennecessary.Whilecoordination-freeexecutionispowerful,areanyusefuloperationssafelyexecutablewithoutcoordination?I-conuenceanalysisdetermineswhenconcurrentexecutionofspecicopera-tionscanbe“merged”intovaliddatabasestate;weaccordingly 1Ouruseoftheterm“consistency”inthispaperreferstoapplication-levelcorrectness,asistraditionalinthedatabaseliterature[15,21,25,30,56].AswediscussinSection5,replicateddataconsistency(andisolation[2,9])modelslikelinearizability[28]canbecastasapplicationcriteriaifdesired. analyzeinvariantsandoperationsfromseveralreal-worlddatabasesandapplications.Manyproductiondatabasestodayalreadysupportinvariantsintheformofprimarykey,uniqueness,foreignkey,androw-levelcheckconstraints[9,42].WeanalyzetheseandshowmanyareI-conuent,includingformsofforeignkeyconstraints,uniquevaluegeneration,andcheckconstraints,whileothers,likeprimarykeyconstraintsare,ingeneral,not.WealsoconsiderentireapplicationsandapplyouranalysistotheworkloadsoftheOLTP-Benchmarksuite[23].ManyoftheoperationsandinvariantsareI-conuent.Asanextendedcasestudy,weexaminetheTPC-Cbenchmark[55],thepreferredstandardforevaluatingnewconcur-rencycontrolalgorithms[23,35,46,52,54].WeshowthattenoftwelveofTPC-C'sinvariantsareI-conuentundertheworkloadtransactionsand,moreimportantly,compliantTPC-Ccanbeim-plementedwithoutanysynchronouscoordinationacrossservers.Wesubsequentlyscaleacoordination-avoidingdatabaseprototypelinearly,toover12.7MTPC-CNew-Ordertransactionspersecondon200servers,a25-foldimprovementoverpriorresults.Overall,I-conuenceoffersaconcretegrasponthechallengeofminimizingcoordinationwhileensuringapplication-levelcorrect-ness.Inseekinganecessaryandsufcient(i.e.,“tight”)conditionforsafe,coordination-freeexecution,werequiretheprogrammertospecifyhercorrectnesscriteria.Ifeitherthesecriteriaorapplicationoperationsareunavailableforinspection,usersmustfallbacktousingserializabletransactionsor,alternatively,performthesamead-hocanalysestheyusetoday[12].Moreover,itisalreadywellknownthatcoordinationisrequiredtopreventseveralread/writeisolationanomalieslikenon-linearizableoperations[9,28].However,whenuserscancorrectlyspecifytheirapplicationcorrectnesscriteriaandoperations,theycanmaximizescalabilitywithoutrequiringexper-tiseinthemilieuofweakread/writeisolationmodels[2,9].WehavealsofoundthatI-conuencetobeausefuldesigntool:studyingspeciccombinationsofinvariantsandoperationscanindicatetheexistenceofmorescalablealgorithms[18].Insummary,thispaperoffersthefollowinghigh-leveltakeaways:1.Serializabletransactionspreserveapplicationcorrectnessatthecostofalwayscoordinatingbetweenconictingreadsandwrites.2.Givenknowledgeofapplicationtransactionsandcorrectnesscrite-ria(e.g.,invariants),itisoftenpossibletoavoidthiscoordination(byexecutingsometransactionswithoutcoordination,thuspro-vidingavailability,lowlatency,andexcellentscalability)whilestillpreservingthosecorrectnesscriteria.3.Invariantconuenceoffersanecessaryandsufcientconditionforthiscorrectness-preserving,coordination-freeexecution.4.ManycommonintegrityconstraintsfoundinSQLandstan-dardizedbenchmarksareinvariantconuent,allowingorder-of-magnitudeperformancegainsovercoordinatedexecution.Whilecoordinationcannotalwaysbeavoided,thisworkevidencesthepowerofapplicationinvariantsinscalableandcorrectexecu-tionofmodernapplicationsonmodernhardware.Applicationcor-rectnessdoesnotalwaysrequirecoordination,andI-conuenceanalysiscanexplainbothwhenandwhythisisthecase.Overview.Theremainderofthispaperproceedsasfollows:Sec-tion2describesandquantiesthecostsofcoordination.Section3introducesoursystemmodelandSection4containsourprimarytheoreticalresult.ReadersmayskiptoSection5forpracticalap-plicationsofI-conuencetoreal-worldinvariant-operationcom-binations.Section6subsequentlyappliesthesecombinationstorealapplicationsandpresentsanexperimentalcasestudyofTPC-C.Section7describesrelatedwork,andSection8concludes.2.CONFLICTSANDCOORDINATIONAsrepositoriesforapplicationstate,databasesaretraditionallytaskedwithmaintainingcorrectdataonbehalfofusers.Duringconcurrentaccesstodata,adatabaseensuringcorrectnessmustthereforedecidewhichuseroperationscanexecutesimultaneouslyandwhich,ifany,mustcoordinate,orblock.Inthissection,weex-ploretherelationshipbetweenthecorrectnesscriteriathatadatabaseattemptstomaintainandthecoordinationcostsofdoingso.Byexample.Asarunningexample,weconsideradatabase-backedpayrollapplicationthatmaintainsinformationaboutemployeesanddepartmentswithinasmallbusiness.Intheapplication,a:)eachemployeeisassignedauniqueIDnumberandb:)eachemployeebelongstoexactlyonedepartment.Adatabaseensuringcorrectnessmustmaintaintheseapplication-levelproperties,orinvariantsonbehalfoftheapplication(i.e.,withoutapplication-levelintervention).Inourpayrollapplication,thisisnon-trivial:forexample,iftheapplicationattemptstosimultaneouslycreatetwoemployees,thenthedatabasemustensuretheemployeesareassigneddistinctIDs.Serializabilityandconicts.Theclassicanswertomaintain-ingapplication-levelinvariantsistouseserializableisolation:ex-ecuteeachuser'sorderedsequenceofoperations,ortransactions,suchthattheendresultisequivalenttosomesequentialexecu-tion[15,30,53].Ifeachtransactionpreservescorrectnessinisola-tion,compositionviaserializableexecutionensurescorrectness.Inourpayrollexample,thedatabasewouldexecutethetwoemployeecreationtransactionssuchthatonetransactionappearstoexecuteaftertheother,avoidingduplicateIDassignment.Whileserializabilityisapowerfulabstraction,itcomeswithacost:forarbitrarytransactions(andforallimplementationsofse-rializability'smoreconservativevariant—conictserializability),anytwooperationstothesameitem—atleastoneofwhichisawrite—willresultinaread/writeconict.Underserializability,theseconictsrequirecoordinationor,informally,blockingcom-municationbetweenconcurrenttransactions:toprovideaserialordering,conictsmustbetotallyorderedacrosstransactions[15].Forexample,givendatabasestatefx=?;y=?g,iftransactionT1writesx=1andreadsfromyandT2writesy=1andreadsfromx,adatabasecannotbothexecuteT1andT2entirelyconcurrentlyandmaintainserializability[9,21].Thecostsofcoordination.Thecoordinationoverheadsaboveincurthreeprimarypenalties:increasedlatency(duetostalledexe-cution),decreasedthroughput,and,intheeventofpartialfailures,unavailability.Ifatransactiontakesdsecondstoexecute,themaxi-mumthroughputofconictingtransactionsoperatingonthesameitemsunderageneral-purpose(i.e.,interactive,non-batched)trans-actionmodelislimitedby1 d,whilecoordinatingoperationswillalsohavetowait.Onasinglesystem,delayscanbesmall,permittingtenstohundredsofthousandsofconictingtransactionsperitempersecond.Inapartitioneddatabasesystem,wheredifferentitemsarelocatedondifferentservers,orinareplicateddatabasesystem,wherethesameitemislocated(andisavailableforoperations)onmultipleservers,thecostincreases:delayislower-boundedbynet-worklatency.Onalocalareanetwork,delaymayvaryfromseveralmicroseconds(e.g.,viaInnibandorRDMA)toseveralmillisec-ondsontoday'scloudinfrastructure,permittinganywherefromafewhundredtransactionstoafewhundredthousandtransactionspersecond.Onawide-areanetwork,delayislower-boundedbythespeedoflight(worst-caseonEarth,around75ms,orabout13operationspersecond[9]).Undernetworkpartitions[13],asdelaytendstowardsinnity,thesepenaltiesleadtounavailability[9,28].Incontrast,operationsexecutingwithoutcoordinationcanproceedconcurrentlyandwillnotincurthesepenalties. Figure1:Microbenchmarkperformanceofcoordinatedandcoordination-freeexecutionoftransactionsofvaryingsizewrit-ingtoeightitemslocatedoneightseparatemulti-coreservers.Quantifyingcoordinationoverheads.Tofurtherunderstandthecostsofcoordination,weperformedtwosetsofmeasurements—oneusingadatabaseprototypeandoneusingtracesfrompriorstudies.Werstcomparedthethroughputofasetofcoordinatedandcoordination-freetransactionexecution.Wepartitionedasetofeightdataitemsacrosseightserversandranonesetoftransactionswithanoptimizedvariantoftwo-phaselocking(providingserializ-ability)[15]andrananothersetoftransactionswithoutcoordination(Figure1;see[10,AppendixA]formoredetails).Withsingle-item,non-distributedtransactions,thecoordination-freeimplementationachieves,inaggregate,over12Mtransactionspersecondandbot-tlenecksonphysicalresources—namely,CPUcycles.Incontrast,thelock-basedimplementationachievesapproximately1:1Mtrans-actionspersecond:itisunabletofullyutilizeallmulti-corepro-cessorcontextsduetolockcontention.Fordistributedtransactions,coordination-freethroughputdecreaseslinearly(asanN-itemtrans-actionperformsNwrites),whilethethroughputofcoordinatingtransactionsdropsbyoverthreeordersofmagnitude.Whiletheabovemicrobenchmarkdemonstratesthecostsofaparticularimplementationofcoordination,wealsostudiedtheef-fectofmorefundamental,implementation-independentoverheads(i.e.,alsoapplicabletooptimisticandscheduling-basedconcur-rencycontrolmechanisms).Wedeterminedthemaximumattainablethroughputforcoordinatedexecutionwithinasingledatacenter(basedondatafrom[60])andacrossmultipledatacenters(basedondatafrom[9])duetoblockingcoordinationduringatomiccommit-ment[15].ForanN-servertransaction,classictwo-phasecommit(C-2PC)requiresN(parallel)coordinatortoserverRTTs,whilede-centralizedtwo-phasecommit(D-2PC)requiresN(parallel)servertoserverbroadcasts,orN2messages.Figure2showsthat,inthelocalarea,withonlytwoservers(e.g.,tworeplicasortwocoordi-natingoperationsonitemsresidingondifferentservers),throughputisboundedby1125transactions/s(viaD-2PC;668/sviaC-2PC).Acrosseightservers,D-2PCthroughputdropsto173transactions/s(resp.321forC-2PC)duetolong-tailedlatencydistributions.Inthewidearea,theeffectsaremorestark:ifcoordinatingfromVirginiatoOregon,D-2PCmessagedelaysare83mspercommit,allowing12operationspersecond.IfcoordinatingbetweenalleightEC2availabilityzones,throughputdropstoslightlyover2transactions/sinbothalgorithms.([10,AppendixA]providesmoredetails.)Theseresultsshouldbeunsurprising:coordinating—especiallyoverthenetwork—canincurseriousperformancepenalties.Incontrast,coordination-freeoperationscanexecutewithoutincurringthesecosts.Thecostsofactualworkloadscanvary:ifcoordinatingoperationsarerare,concurrencycontrolwillnotbeabottleneck.Forexample,aserializabledatabaseexecutingtransactionswithdisjointreadandwritesetscanperformaswellasanon-serializabledatabasewithoutcompromisingcorrectness[34].However,asthese a.)Maximumtransactionthroughputoverlocal-areanetworkin[60] b.)Maximumthroughputoverwide-areanetworkin[9]withtransactionsorigi-natingfromacoordinatorinVirginia(VA;OR:Oregon,CA:California,IR:Ire-land,SP:SãoPaulo,TO:Tokyo,SI:Singapore,SY:Sydney)Figure2:AtomiccommitmentlatencyasanupperboundonthroughputoverLANandWANnetworks.resultsdemonstrate,minimizingtheamountofcoordinationanditsdegreeofdistributioncanthereforehaveatangibleimpactonperformance,latency,andavailability[1,9,28].WhilewestudyrealapplicationsinSection6,thesemeasurementshighlighttheworstofcoordinationcostsonmodernhardware.Ourgoal:Minimizecoordination.Inthispaper,weseektomin-imizetheamountofcoordinationrequiredtocorrectlyexecuteanapplication'stransactions.AsdiscussedinSection1,serializabilityissufcienttomaintaincorrectnessbutisnotalwaysnecessary;thatis,many—butnotall—transactionscanbeexecutedconcurrentlywithoutnecessarilycompromisingapplicationcorrectness.Intheremainderofthispaper,weidentifywhensafe,coordination-freeexecutionispossible.Ifserializabilityrequirescoordinatingbe-tweeneachpossiblepairofconictingreadsandwrites,wewillonlycoordinatebetweenpairsofoperationsthatmightcompromiseapplication-levelcorrectness.Todoso,wemustbothraisethespecicationofcorrectnessbeyondthelevelofreadsandwritesanddirectlyaccountfortheprocessofreconcilingtheeffectsofconcurrenttransactionexecutionattheapplicationlevel.3.SYSTEMMODELTocharacterizecoordinationavoidance,werstpresentasys-temmodel.Webeginwithaninformaloverview.Inourmodel,transactionsoperateoverindependent(logical)“snapshots”ofda-tabasestate.Transactionwritesareappliedatoneormoresnap-shotsinitiallywhenthetransactioncommitsandthenareintegratedintoothersnapshotsasynchronouslyviaa“merge”operatorthatincorporatesthosechangesintothesnapshot'sstate.Givenasetofinvariantsdescribingvaliddatabasestates,asTable1outlines,weseektounderstandwhenitispossibletoensureinvariantsarealwayssatised(globalvalidity)whileguaranteeingaresponse(transactionalavailability)andtheexistenceofacommonstate(con-vergence),allwithoutcommunicationduringtransactionexecution(coordination-freedom).Thismodelneednotdirectlycorrespondto Property Effect Globalvalidity Invariantsholdovercommittedstates Transactionalavailability Non-trivialresponseguaranteed Convergence Updatesarereectedinsharedstate Coordination-freedom Nosynchronouscoordination Table1:Keypropertiesofthesystemmodelandtheireffects.agivenimplementation(e.g.,seethedatabasearchitectureinSec-tion6)—rather,itservesasausefulabstraction.Theremainderofthissectionfurtherdenestheseconcepts;readersmoreinterestedintheirapplicationshouldproceedtoSection4.Weprovidegreaterdetailandadditionaldiscussionin[10].Databases.WerepresentastateoftheshareddatabaseasasetDofuniqueversionsofdataitemslocatedonanarbitrarysetofdatabaseservers,andeachversionislocatedonatleastoneserver.WeuseDtodenotethesetofpossibledatabasestates—thatis,thesetofsetsofversions.ThedatabaseisinitiallypopulatedbyaninitialstateD0(typicallybutnotnecessarilyempty).Transactions,Replicas,andMerging.Applicationclientssubmitrequeststothedatabaseintheformoftransactions,ororderedgroupsofoperationsondataitemsthatshouldbeexecutedtogether.Eachtransactionoperatesonalogicalreplica,orsetofversionsoftheitemsmentionedinthetransaction.Atthebeginningofthetransaction,thereplicacontainsasubsetofthedatabasestateandisformedfromalloftheversionsoftherelevantitemsthatcanbefoundatoneormorephysicalserversthatarecontactedduringtransactionexecution.Asthetransactionexecutes,itmayaddversions(ofitemsinitswriteset)toitsreplica.Thus,wedeneatransactionTasatransformationonareplica:T:D!D.Wetreattransactionsasopaquetransformationsthatcancontainwrites(whichaddnewversionstothereplica'ssetofversions)orreads(whichreturnaspecicsetofversionsfromthereplica).(Later,wewilldiscusstransactionsoperatingondatatypessuchascounters.)Uponcompletion,eachtransactioncancommit,signalingsuccess,orabort,signalingfailure.Uponcommit,thereplicastateissubse-quentlymerged(t:DD!D)intothesetofversionsatleastoneserver.Werequirethatthemergedeffectsofacommittedtransactionwilleventuallybecomevisibletoothertransactionsthatlaterbeginexecutiononthesameserver.2Overtime,effectspropagatetootherservers,againthroughtheuseofthemergeoperator.Thoughnotstrictlynecessary,weassumethismergeoperatoriscommutative,associative,andidempotent[5,50].Inourinitialmodel,wedenemergeassetunionoftheversionscontainedatdifferentservers.(Section5discussesadditionalimplementations.)Forexample,ifserverRx=fvgandRy=fwg,thenRxtRy=fv;wg.Ineffect,eachtransactioncanmodifyitsreplicastatewithoutmodifyinganyotherconcurrentlyexecutingtransactions'replicastate.Replicasthereforeprovidetransactionswithpartial“snap-shot”viewsofglobalstate(thatwewillusetosimulateconcurrentexecutions,similartorevisiondiagrams[17]).Importantly,twotransactions'replicasdonotnecessarilycorrespondtotwophys-icallyseparateservers;rather,areplicaissimplyapartial“view”overtheglobalstateofthedatabasesystem.Fornow,weassumetransactionsareknowninadvance(seealso[10,Section8]).Invariants.Todeterminewhetheradatabasestateisvalidac-cordingtoapplicationcorrectnesscriteria,weuseinvariants,orpredicatesoverreplicastate:I:D!ftrue;falseg[25].Inour 2Thisimplicitlydisallowsserversfromalwaysreturningtheinitialdatabasestatewhentheyhavenewerwritesonhand.Thisisarelativelypragmaticassumptionbutalsosimpliesourlaterreasoningaboutadmissibleexecu-tions.ThisassumptioncouldpossiblyberelaxedbyadaptingNewman'slemma[24],butwedonotconsiderthepossibilityhere. Figure3:Anexamplecoordination-freeexecutionoftwotrans-actions,T1andT2,ontwoservers.Eachtransactionwritestoitslocalreplica,then,aftercommit,theserversasynchronouslyexchangestateandconvergetoacommonstate(D3).payrollexample,wecouldspecifyaninvariantthatonlyoneuserinadatabasehasagivenID.Thisinvariant—aswellasalmostallinvariantsweconsider—isnaturallyexpressedasapartofthedatabaseschema(e.g.,viaDDL);however,ourapproachallowsustoreasonaboutinvariantseveniftheyareknowntothedeveloperbutnotdeclaredtothesystem.InvariantsdirectlycapturethenotionofACIDConsistency[15,30],andwesaythatadatabasestateisvalidunderaninvariantI(orI-valid)ifitsatisesthepredicate:Denition1.AreplicastateR2DisI-validiffI(R)=true.WerequirethatD0bevalidunderinvariants.Section4.3providesadditionaldiscussionregardingouruseofinvariants.Availability.Toensureeachtransactionreceivesanon-trivialresponse,weadoptthefollowingdenitionofavailability[9]:Denition2.Asystemprovidestransactionallyavailableexecu-tioniff,wheneveraclientexecutingatransactionTcanaccessserverscontainingoneormoreversionsofeachiteminT,thenTeventuallycommitsorabortsitselfeitherduetoanabortopera-tioninTorifcommittingthetransactionwouldviolateadeclaredinvariantoverT'sreplicastate.Twillcommitinallothercases.Undertheabovedenition,atransactioncanonlyabortifitexplicitlychoosestoabortitselforifcommittingwouldviolateinvariantsoverthetransaction'sreplicastate.3Convergence.Transactionalavailabilityallowsreplicastomain-tainvalidstateindependently,butitisvacuouslypossibletomaintain“consistent”databasestatesbylettingreplicasdiverge(containdif-ferentstate)forever.Thisguaranteessafety(nothingbadhappens)butnotliveness(somethinggoodhappens)[49].Toenforcestatesharing,weadoptthefollowingdenition:Denition3.Asystemisconvergentiff,foreachpairofservers,intheabsenceofnewwritestotheserversandintheabsenceofindenitecommunicationdelaysbetweentheservers,theserverseventuallycontainthesameversionsforanyitemtheybothstore.Tocapturetheprocessofreconcilingdivergentstates,weusethepreviouslyintroducedmergeoperator:giventwodivergentserverstates,weapplythemergeoperatortoproduceconvergentstate.Weassumetheeffectsofmergeareatomicallyvisible:eitheralleffectsofamergearevisibleornoneare.Thisassumptionisnotalways 3Thisbasicdenitionprecludesfaulttolerance(i.e.,durability)guaranteesbeyondasingleserverfailure[9].Wecanrelaxthisrequirementandallowcommunicationwithaxednumberofservers(e.g.,F+1serversforF-faulttolerance;Fisoftensmall[22])withoutaffectingourresults.Thisdoesnotaffectscalabilitybecause,asmorereplicasareadded,thecommunicationoverheadrequiredfordurabilityremainsconstant. necessarybutitsimpliesourdiscussionand,aswelaterdiscuss,ismaintainablewithoutcoordination[9,11].Maintainingvalidity.Tomakesurethatbothdivergentandcon-vergentdatabasestatesarevalidand,therefore,thattransactionsneverobserveinvalidstates,weintroducethefollowingproperty:Denition4.AsystemisgloballyI-validiffallreplicasalwayscontainI-validstate.Coordination.Oursystemmodelismissingonenalconstraintoncoordinationbetweenconcurrenttransactionexecution:Denition5.Asystemprovidescoordination-freeexecutionforasetoftransactionsTifftheprogressofexecutingeacht2Tisonlydependentontheversionsoftheitemstreads(i.e.,t'sreplicastate).Thatis,inacoordination-freeexecution,eachtransaction'sprogresstowardscommit/abortisindependentofotheroperations(e.g.,writes,locking,validations)beingperformedonbehalfofothertransac-tions.Thisprecludesblockingsynchronizationorcommunicationacrossconcurrentlyexecutingtransactions.Byexample.Figure3illustratesacoordination-freeexecutionoftwotransactionsT1andT2ontwoseparate,fully-replicatedphysicalservers.Eachtransactioncommitsonitslocalreplica,andtheresultofeachtransactionisreectedinthetransaction'slocalserverstate.Afterthetransactionshavecompleted,theserversexchangestateand,afterapplyingthemergeoperator,convergetothesamestate.Anytransactionsexecutinglateroneitherserverwillobtainareplicathatincludestheeffectsofbothtransactions.4.CONSISTENCYSANSCOORDINATIONWithasystemmodelandgoalsinhand,wenowaddressthequestion:whendoapplicationsrequirecoordinationforcorrectness?Theanswerdependsnotjustonanapplication'stransactionsoronanapplication'sinvariants.Rather,theanswerdependsonthecombinationofthetwounderconsideration.Ourcontributioninthissectionistoformulateacriterionthatwillanswerthisquestionforspeciccombinationsinanimplementation-agnosticmanner.Inthissection,wefocusalmostexclusivelyonprovidingaformalanswertothisquestion.Theremainingsectionsofthispaperaredevotedtopracticalinterpretationandapplicationoftheseresults.4.1I­conuence:CriteriaDenedTobegin,weintroducethecentralproperty(adaptedfromtheconstraintprogrammingliterature[24])inourmainresult:invariantconuence(hereafter,I-conuence).Appliedinatransactionalcontext,theI-conuencepropertyinformallyensuresthatdivergentbutI-validdatabasestatescanbemergedintoavaliddatabasestate—thatis,thesetofvalidstatesreachablebyexecutingtransactionsandmergingtheirresultsisclosed(w.r.t.validity)undermerge.Inthenextsub-section,weshowthatI-conuenceanalysisdirectlydeterminesthepotentialforsafe,coordination-freeexecution.WesaythatSiisaI-T-reachablestateif,givenaninvariantIandsetoftransactionsT(withmergefunctiont),thereexistsa(partiallyordered)sequenceoftransactionandmergefunctioninvocationsthatyieldsSi,andeachintermediatestateproducedbytransactionexecutionormergeinvocationisalsoI-valid.Wecallthesepreviousstatesancestorstates.NotethateachancestorstateiseitherI-T-reachableorisinsteadtheinitialstate(D0).WecannowformalizetheI-conuenceproperty:Denition6(I-conuence).AsetoftransactionsTisI-conuentwithrespecttoinvariantIif,forallI-T-reachablestatesDi,Djwithacommonancestorstate,DitDjisI-valid. Figure4:AnI-conuentexecutionillustratedviaadiamonddiagram.IfasetoftransactionsTisI-conuent,thenalldata-basestatesreachablebyexecutingandmergingtransactionsinTstartingwithacommonancestor(Ds)mustbemergeable(t)intoanI-validdatabasestate.Figure4depictsanI-conuentmergeoftwoI-T-reachablestates,eachstartingfromashared,I-T-reachablestateDs.Twosequencesoftransactionstin:::ti1andtjm:::tj1eachindependentlymodifyDs.UnderI-conuence,thestatesproducedbythesesequences(DinandDjm)mustbevalidundermerge.4I-conuenceholdsforspeciccombinationsofinvariantsandtransactions.InourpayrolldatabaseexamplefromSection2,re-movingauserfromthedatabaseisI-conuentwithrespecttotheinvariantthatuserIDsareunique.However,twotransactionsthatremovetwodifferentusersfromthedatabasearenotI-conuentwithrespecttotheinvariantthatthereexistsatleastoneuserinthedatabaseatalltimes.Section5discussesadditionalcombinationsofinvariants(withgreaterprecision).4.2I­conuenceandCoordinationWecannowapplyI-conuencetoourgoalsfromSection3:Theorem1.AgloballyI-validsystemcanexecuteasetoftransac-tionsTwithcoordination-freedom,transactionalavailability,con-vergenceifandonlyifTisI-conuentwithrespecttoI.WeprovideafullproofofTheorem1in[10,AppendixB](whichisstraightforward)butprovideasketchhere.Thebackwardsdirec-tionisbyconstruction:ifI-conuenceholds,eachreplicacancheckeachtransaction'smodicationslocallyandreplicascanmergein-dependentmodicationstoguaranteeconvergencetoavalidstate.Theforwardsdirectionusesapartitioningargument[28]toderiveacontradiction:weconstructascenariounderwhichasystemcannotdeterminewhetheranon-I-conuenttransactionshouldcommitwithoutviolatingoneofourdesiredproperties(eithercompromisingvalidityoravailability,divergingforever,orcoordinating).Theorem1establishesI-conuenceasanecessaryandsufcientconditionforinvariant-preserving,coordination-freeexecution.IfI-conuenceholds,thereexistsacorrect,coordination-freeexecu-tionstrategyforthetransactions;ifnot,nopossibleimplementationcanguaranteethesepropertiesfortheprovidedinvariantsandtrans-actions.Thatis,ifI-conuencedoesnothold,thereexistsatleastoneexecutionoftransactionsonseparatereplicasthatwillviolatethegiveninvariantswhenserversconverge.Topreventinvalidstatesfromoccurring,atleastoneofthetransactionsequenceswillhavetoforegoavailabilityorcoordination-freedom,orthesystemwillhavetoforegoconvergence.I-conuenceanalysisisindependentofanygivenimplementation,andeffectively“lifts”priordiscussions 4Werequirethesestatestohaveacommonancestortoruleoutthepossibilityofmergingstatesthatcouldnothavearisenfromtransactionexecution(e.g.,evenifnotransactionassignsIDs,mergingtwostatesthateachhaveuniquebutoverlappingsetsofIDscouldbeinvalid). ofscalability,availability,andlowlatency[1,9,28]tothelevelofapplication(i.e.,not“I/O”[6])correctness.Thisprovidesausefulhandleontheimplicationsofcoordination-freeexecutionwithoutrequiringreasoningaboutlow-levelpropertiessuchasphysicaldatalocationandthenumberofservers.4.3DiscussionandLimitationsI-conuencecapturesasimple(informal)rule:coordinationcanonlybeavoidedifalllocalcommitdecisionsaregloballyvalid.(Alternatively,commitdecisionsarecomposable.)Iftwoindepen-dentdecisionstocommitcanresultininvalidconvergedstate,thenreplicasmustcoordinateinordertoensurethatonlyoneofthedeci-sionsistocommit.Giventheexistenceofanunsafeexecutionandtheinabilitytodistinguishbetweensafeandinvalidexecutionsusingonlylocalinformation,agloballyvalidsystemmustcoordinateinordertopreventtheinvalidexecutionfromarising.Useofinvariants.OuruseofinvariantsinI-conuenceiskeytoachievinganecessaryandnotsimplysufcientcondition.Bydi-rectlycapturingapplication-levelcorrectnesscriteriaviainvariants,I-conuenceanalysisonlyidenties“true”conicts.ThisallowsI-conuenceanalysistoperformamoreaccurateassessmentofwhethercoordinationisneededcomparedtorelatedconditionssuchascommutativity(Section7).However,therelianceoninvariantsalsohasdrawbacks.I-conuenceanalysisonlyguardsagainstviolationsofanyprovidedinvariants.Ifinvariantsareincorrectlyorincompletelyspecied,anI-conuentdatabasesystemmayviolateapplication-levelcorrect-ness.Ifuserscannotguaranteethecorrectnessandcompletenessoftheirinvariantsandoperations,theyshouldoptforamorecon-servativeanalysisormechanismsuchasemployingserializabletransactions.Accordingly,ourdevelopmentofI-conuenceanal-ysisprovidesdeveloperswithapowerfuloption—butonlyifusedcorrectly.Ifusedincorrectly,I-conuenceallowsincorrectresults,or,ifnotusedatall,developersmustresorttoexistingalternatives.Thisnalpointraisesseveralquestions:canwespecifyinvariantsinreal-worldusecases?Classicdatabaseconcurrencycontrolmod-elsassumethat“the[setofapplicationinvariants]isgenerallynotknowntothesystembutisembodiedinthestructureofthetransac-tion”[25,56].Nevertheless,since1976,databaseshaveintroducedsupportforanitesetofinvariants[14,26,29,32,37]intheformofprimarykey,foreignkey,uniqueness,androw-level“check”con-straints[42].Wecan(and,inthispaper,do)analyzetheseinvariants,whichcan—likemanyprogramanalyses[18]—leadtonewinsightsaboutexecutionstrategies.Wehavefoundtheprocessofinvariantspecicationtobenon-trivialbutfeasibleinpractice;Section6describessomeofourexperiences.(Non-)determinism.I-conuenceanalysiseffectivelycapturespointsofunsafenon-determinism[6]intransactionexecution.Aswehaveseeninmanyofourexamplesthusfar,totalnon-determinismunderconcurrentexecutioncancompromiseapplication-levelcon-sistency[5,36].Butnotallnon-determinismisbad:manydesirableproperties(e.g.,classicaldistributedconsensusamongprocesses)involveformsofacceptablenon-determinism(e.g.,anyproposedoutcomeisacceptableaslongasallprocessesagree)[31].Inmanycases,maximizingsafeconcurrencyrequiresnon-determinism.I-conuenceanalysisallowsthisnon-deterministicdivergenceofdatabasestatesbutmakestwousefulguaranteesaboutthosestates.First,therequirementforglobalvalidityensuressafety(intheformofinvariants).Second,therequirementforconvergenceensuresliveness(intheformofconvergence).Accordingly,viaitsuseofinvariants,I-conuenceallowsuserstoscopenon-determinismwhilepermittingonlythosestatesthatareacceptable. Invariant Operation I-C? Proof# AttributeEquality Any Yes 1 AttributeInequality Any Yes 2 Uniqueness Choosespecicvalue No 3 Uniqueness Choosesomevalue Yes 4 AUTO_INCREMENT Insert No 5 ForeignKey Insert Yes 6 ForeignKey Delete No 7 ForeignKey CascadingDelete Yes 8 SecondaryIndexing Update Yes 9 MaterializedViews Update Yes 10 � Increment[Counter] Yes 11 Increment[Counter] No 12 � Decrement[Counter] No 13 Decrement[Counter] Yes 14 [NOT]CONTAINS Any[Set,List,Map] Yes 15,16 SIZE= Mutation[Set,List,Map] No 17 Table2:ExampleSQL(top)andADTinvariantI-conuencealongwithreferencestoformalproofsin[10,AppendixC].5.APPLYINGINVARIANTCONFLUENCEAsatestforcoordinationrequirements,I-conuenceexposesatrade-offbetweentheoperationsauserwishestoperformandthepropertiesshewishestoguarantee.Atoneextreme,ifauser'stransactionsdonotmodifydatabasestate,shecanguaranteeanysatisableinvariant.Attheotherextreme,withnoinvariants,ausercansafelyperformanyoperationsshelikes.Thespacein-betweencontainsaspectrumofinterestingandusefulcombinations.Untilnow,wehavebeenlargelyconcernedwithformalizingI-conuenceforabstractoperations;inthissection,webegintoleveragethisproperty.WeexamineaseriesofpracticalinvariantsbyconsideringseveralfeaturesofSQL,endingwithabstractdatatypesandrevisitingourpayrollexamplealongtheway.WewillapplytheseresultstofullapplicationsinSection6.Inthissection,wefocusonprovidingintuitionandinformalex-planationsofourI-conuenceanalysis.Interestedreaderscanndamoreformalanalysisin[10,AppendixC],includingdiscussionofinvariantsnotpresentedhere.Forconvenience,wereferencespecicproofsfrom[10,AppendixC]inline.5.1I­conuenceforRelationsWebeginbyconsideringseveralconstraintsfoundinSQL.Equality.Asawarm-up,whatifanapplicationwantstopreventaparticularvaluefromappearinginadatabase?Forexample,ourpayrollapplicationfromSection2mightrequirethateveryuserhavealastname,markingtheLNAMEcolumnwithaNOTNULLconstraint.Whilenotparticularlyexciting,wecanapplyI-conuenceanalysistoinsertionsandupdatesofdatabaseswith(in-)equalityconstraints(Claims1,2in[10,AppendixC]).Per-recordinequalityinvariantsareI-conuent,whichwecanshowbycontradiction:assumetwodatabasestatesS1andS2areeachI-T-reachableunderper-recordin-equalityinvariantIebutthatIe(S1tS2)isfalse.Thentheremustbear2S1tS2thatviolatesIe(i.e.,rhastheforbiddenvalue).rmustappearinS1,S2,orboth.But,thatwouldimplythatoneofS1orS2isnotI-validunderIe,acontradiction.Uniqueness.Wecanalsoconsidercommonuniquenessinvariants(e.g.,PRIMARYKEYandUNIQUEconstraints).Forexample,inourpayrollexample,wewanteduserIDstobeunique.Infact,ourearlierdiscussioninSection2alreadyprovidedacounterexampleshowingthatarbitraryinsertionofusersisnotI-conuentundertheseinvariants:fStan:5gandfMary:5garebothI-T-reachablestatesthatcanbecreatedbyasequenceofinsertions(startingatS0=fg),buttheirmerge—fStan:5,Mary:5g—isnotI-valid.Therefore, uniquenessisnotI-conuentforinsertsofuniquevalues(Claim3).However,readsanddeletionsarebothI-conuentunderuniquenessinvariants:readingandremovingitemscannotintroduceduplicates.Canthedatabasesafelychooseuniquevaluesonbehalfofusers(e.g.,assignanewuseranID)?Inthiscase,wecanachieveunique-nesswithoutcoordination—aslongaswehaveanotionofreplicamembership(e.g.,serverorreplicaIDs).Thedifferenceissub-tle(“grantthisrecordthisspecic,uniqueID”versus“grantthisrecordsomeuniqueID”),but,inasystemmodelwithmembership(asispracticalinmanycontexts),ispowerful.IfreplicasassignuniqueIDswithintheirrespectiveportionoftheIDnamespace,thenmerginglocallyvalidstateswillalsobegloballyvalid(Claim4).ForeignKeys.Wecanconsidermorecomplexinvariants,suchasforeignkeyconstraints.Inourpayrollexample,eachemployeebelongstoadepartment,sotheapplicationcouldspecifyaconstraintviaaschemadeclarationtocapturethisrelationship(e.g.,EMP.D_IDFOREIGNKEYREFERENCESDEPT.ID).Areforeignkeyconstraintsmaintainablewithoutcoordination?Again,theanswerdependsontheactionsoftransactionsmodifyingthedatagovernedbytheinvariant.InsertionsunderforeignkeyconstraintsareI-conuent(Claim6).Toshowthis,weagainattempttondtwoI-T-reachablestatesthat,whenmerged,resultininvalidstate.Underforeignkeyconstraints,aninvalidstatewillcontainarecordwitha“danglingpointer”—arecordmissingacorrespondingrecordontheoppositesideoftheassociation.IfweassumethereexistssomeinvalidstateS1tS2containingarecordrwithaninvalidforeignkeytorecordf,butS1andS2arebothvalid,thenrmustappearinS1,S2,orboth.But,sinceS1andS2arebothvalid,rmusthaveacorrespondingforeignkeyrecord(f)that“disappeared”duringmerge.Merge(inthecurrentmodel)doesnotremoveversions,sothisisimpossible.FromtheperspectiveofI-conuenceanalysis,foreignkeycon-straintsconcernthevisibilityofrelatedupdates:ifindividualdata-basestatesmaintainreferentialintegrity,anon-destructivemergefunctionsuchassetunioncannotcausetuplesto“disappear”andcompromisetheconstraint.Thisalsoexplainswhymodelssuchasreadcommitted[2]andreadatomic[2]isolationaswellascausalconsistency[9]arealsoachievablewithoutcoordination:simplyrestrictingthevisibilityofupdatesinagiventransaction'sreadsetdoesnotrequirecoordinationbetweenconcurrentoperations.Deletionsandmodicationsunderforeignkeyconstraintsaremorechallenging.Arbitrarydeletionofrecordsisunsafe:ausermightbeaddedtoadepartmentthatwasconcurrentlydeleted(Claim7).However,performingcascadingdeletions(e.g.,SQLDELETECASCADE),wherethedeletionofarecordalsodeletesallmatchingrecordsontheoppositeendoftheassociation,isI-conuentunderforeignkeyconstraints(Claim8).Wecangeneralizethisdiscussiontoupdates(andcascadingupdates).MaterializedViews.Applicationsoftenpre-computeresultstospeedqueryperformanceviaamaterializedview[53](e.g.,UNREAD_CNTasSELECTCOUNT(*)FROMemailsWHEREread_date=NULL).Wecanconsideraclassofinvariantsthatspecifythatma-terializedviewsreectprimarydata;whenatransaction(ormergeinvocation)modiesdata,anyrelevantmaterializedviewsshouldbeupdatedaswell.Thisrequiresinstallingupdatesatthesametimeasthechangestotheprimarydataareinstalled(aproblemrelatedtomaintainingforeignkeyconstraints).However,giventhataviewonlyreectsprimarydata,thereareno“conicts.”Thus,materializedviewmaintenanceupdatesareI-conuent(Claim10).5.2I­conuenceforDataTypesSofar,wehaveconsidereddatabasesthatstoregrowingsetsofimmutableversions.Wehaveusedthismodeltoanalyzeseveralusefulconstraints,but,inpractice,databasesdonot(often)providethesesemantics,leadingtoavarietyofinterestinganomalies.Forex-ample,ifweimplementauser'saccountbalanceusinga“lastwriterwins”mergepolicy[50],thenperformingtwoconcurrentwith-drawaltransactionsmightresultinadatabasestatereectingonlyonetransaction(aclassicexampleoftheLostUpdateanomaly)[2,9].Toavoidvariantsoftheseanomalies,manyoptimistic,coordination-freedatabasedesignshaveproposedtheuseofabstractdatatypes(ADTs),providingmergefunctionsforavarietyofusessuchascounters,sets,andmaps[19,44,50,58]thatensurethatallupdatesarereectedinnaldatabasestate.Forexample,adatabasecanrep-resentasimplecounterADTbyrecordingthenumberoftimeseachtransactionperformsanincrementoperationonthecounter[50].I-conuenceanalysisisalsoapplicabletotheseADTsandtheirassociatedinvariants.Forexample,arow-level“greater-than”(�)thresholdinvariantisI-conuentforcounterincrementandassign( )butnotdecrement(Claims11,13),whilearow-level“less-than”()thresholdinvariantisI-conuentforcounterdecrementandassignbutnotincrement(Claims12,14).Thismeansthat,inourpayrollexample,wecanprovidecoordination-freesupportforconcurrentsalaryincrementsbutnotconcurrentsalarydecrements.ADTs(includinglists,sets,andmaps)canbecombinedwithstan-dardrelationalconstraintslikematerializedviewmaintenance(e.g.,the“totalsalary”rowshouldcontainthesumofemployeesalariesintheemployeetable).ThisanalysispresumesuserprogramexplicitlyuseADTs,and,aswithourgenericset-unionmerge,I-conuenceADTanalysisrequiresaspecicationoftheADTmergebehavior([10,AppendixC]providesseveralexamples).5.3DiscussionandLimitationsWehaveanalyzedanumberofcombinationsofinvariantsandoperations(showninTable2).Theseresultsarebynomeanscom-prehensive,buttheyareexpressiveformanyapplications(Section6).Inthissection,wediscusslessonsfromthisclassicationprocess.Analysismechanisms.Here(andin[10,AppendixC]),wemanuallyanalyzedparticularinvariantandoperationcombinations,demonstratingeachtobeI-conuentornot.Tostudyactualap-plications,wecanapplytheselabelsviasimplestaticanalysis.Specically,giveninvariants(e.g.,capturedviaSQLDDL)andtransactions(e.g.,expressedasstoredprocedures),wecanexamineeachinvariantandeachoperationwithineachtransactionandiden-tifypairsthatwehavelabeledasI-conuentornon-I-conuent.AnypairslabeledasI-conuentcanbemarkedassafe,while,forsoundness(butnotcompleteness),anyunrecognizedoperationsorinvariantscanbeaggedaspotentiallynon-I-conuent.Despiteitssimplicity(bothconceptuallyandintermsofimplementation),thistechnique—coupledwiththeresultsofTable2—issufcientlypowerfultoautomaticallycharacterizetheI-conuenceoftheap-plicationsweconsiderinSection6whenexpressedinSQL(withsupportformulti-rowaggregateslikeInvariant8inTable3).BygrowingourrecognizedlistofI-conuentpairsonanas-neededbasis(viamanualanalysisofthepair),theabovetechniquehasprovenuseful—dueinlargeparttothecommonre-useofin-variantslikeforeignkeyconstraints.However,onecouldusemorecomplexformsofprogramanalysis.Forexample,onemightan-alyzetheI-conuenceofarbitraryinvariants,leavingthetaskofprovingordisprovingI-conuencetoanautomatedmodelcheckerorSMTsolver.WhileI-conuence—likemonotonicityandcommu-tativity(Section7)—isundecidableforarbitraryprograms,othershaverecentlyshownthisalternativeapproach(e.g.,incommutativ-ityanalysis[18,40]andininvariantgenerationforviewserializabletransactions[47])tobefruitfulforrestrictedlanguages.Weview languagedesignandmoreautomatedanalysisasaninterestingareaformorespeculativeresearch.Recencyandsessionsupport.Ourproposedinvariantsaredeclar-ative,butaclassofusefulsemantics—recency,orreal-timeguar-anteesonreadsandwrites—areoperational(i.e.,theypertaintotransactionexecutionratherthanthestate(s)ofthedatabase).Forexample,usersoftenwishtoreaddatathatisup-to-dateasofagivenpointintime(e.g.,“readlatest”[20]orlinearizable[28]semantics).Whiletraditionalisolationmodelsdonotdirectlyaddresstheserecencyguarantees[2],theyareoftenimportanttoprogrammers.ArethesemodelsI-conuent?WecanattempttosimulaterecencyguaranteesinI-conuenceanalysisbyloggingtheresultofallreadsandanywriteswithatimestampandrequiringthatallloggedtimestampsrespecttheirrecencyguarantees(thustreatingrecencyguaranteesasinvariantsoverrecordedread/writeexecutiontraces).However,thisisasomewhatpointlessexercise:itiswellknownthatrecencyguaranteesareunachievablewithtransactionalavail-ability[9,21,28].Thus,ifapplicationreadsfacetheserequirements,coordinationisrequired.Indeed,whenapplication”consistency”means“recency,”systemscannotcircumventspeed-of-lightdelays.Ifuserswishto“readtheirwrites”ordesirestronger“session”guarantees[45](e.g.,maintainingrecencyonaper-userorper-sessionbasis),theymustmaintainafnityor“stickiness”[9]withagiven(setof)replicas.TheseguaranteesarealsoexpressibleintheI-conuencemodelanddonotrequirecoordinationbetweendifferentusers'orsessions'transactions.Physicalandlogicalreplication.Wehaveusedtheconceptofreplicastoreasonaboutconcurrenttransactionexecution.However,aspreviouslynoted,ouruseofreplicasissimplyaformaldeviceandisindependentoftheactualconcurrencycontrolmechanismsatwork.Specically,reasoningaboutreplicasallowsustoseparatetheanalysisoftransactionsfromtheirimplementation:justbecauseatransactionisexecutedwith(orwithout)coordinationdoesnotmeanthatallqueryplansorimplementationsrequire(ordonotrequire)coordination[9].However,indecidingonanimplementation,thereisarangeofdesigndecisionsyieldingavarietyofperformancetrade-offs.SimplybecauseanapplicationisI-conuentdoesnotmeanthatallimplementationswillperformequallywell.Rather,I-conuenceensuresthatacoordination-freeimplementationexists.Requirementsandrestrictions.Ourtechniquesarepredicatedontheabilitytocorrectlyandcompletelyspecifyinvariantsandinspectusertransactions;withoutsuchacorrectnessspecication,forarbitrarytransactionschedules,serializabilityis—inasense—the“optimal”strategy[38].Bycastingcorrectnessintermsofadmissibleapplicationstatesratherthanasapropertyofread-writeschedules,weachieveamoreprecisestatementofcoordinationoverheads.However,aswehavenoted,thisdoesnotobviatetheneedforcoordinationinallcases.Finally,whenfullapplicationin-variantsareunavailable,individual,high-valuetransactionsmaybeamenabletooptimizationviaI-conuencecoordinationanalysis.6.EXPERIENCESWITHCOORDINATIONWhenachievable,coordination-freeexecutionenablesscalabil-itylimitedtothatofavailablehardware.Thisispowerful:anI-conuentapplicationcanscaleoutwithoutsacricingcorrectness,latency,oravailability.InSection5,wesawcombinationsofinvari-antsandtransactionsthatwereI-conuentandothersthatwerenot.Inthissection,weapplythesecombinationstotheworkloadsoftheOLTP-Benchsuite[23],withafocusontheTPC-Cbenchmark.Ourfocusisonthecoordinatonrequiredinordertocorrectlyexecuteeachandtheresulting,coordination-relatedperformancecosts. # InformalInvariantDescription Type Txns I-C 1 YTDwhsales=sum(YTDdistrictsales) MV P Yes 2 Per-districtorderIDsaresequential SID+FK N,D No 3 NeworderIDsaresequentiallyassigned SID N,D No 4 Per-district,itemordercount=roll-up MV N Yes 5 Ordercarrierissetifforderispending FK N,D Yes 6 Per-orderitemcount=lineitemroll-up MV N Yes 7 DeliverydatesetiffcarrierIDset FK D Yes 8 YTDwh=sum(historicalwh) MV D Yes 9 YTDdistrict=sum(historicaldistrict) MV P Yes 10 Customerbalancematchesexpenditures MV P,D Yes 11 OrdersreferenceNew-Orderstable FK N Yes 12 Per-customerbalance=cust.expenditures MV P,D Yes Table3:TPC-CDeclared“ConsistencyConditions”(3.3.2.x)andI-conuenceanalysisresults(Invarianttype:MV:mate-rializedview,SID:sequentialIDassignment,FK:foreignkey;Transactions:N:New-Order,P:Payment,D:Delivery).6.1TPC­CInvariantsandExecutionTheTPC-Cbenchmarkisthegoldstandardfordatabaseconcur-rencycontrol[23]bothinresearchandinindustry[55],andinrecentyearshasbeenusedasayardstickfordistributeddatabaseconcur-rencycontrolperformance[52,54,57].HowmuchcoordinationdoesTPC-Cactuallyrequireacompliantexecution?TheTPC-Cworkloadisdesignedtoberepresentativeofawhole-salesupplier'stransactionprocessingrequirements.Theworkloadhasanumberofapplication-levelcorrectnesscriteriathatrepresentbasicbusinessneeds(e.g.,orderIDsmustbeunique)asformulatedbytheTPC-CCouncilandwhichmustbemaintainedinacompliantrun.Wecaninterpretthesewell-dened“consistencycriteria”asinvariantsandsubsequentlyuseI-conuenceanalysistodeterminewhichtransactionsrequirecoordinationandwhichdonot.Table3summarizesthetwelveinvariantsfoundinTPC-CaswellastheirI-conuenceanalysisresultsasdeterminedbyTable2.Weclassifytheinvariantsintothreebroadcategories:materializedviewmaintenance,foreignkeyconstraintmaintenance,anduniqueIDas-signment.AswediscussedinSection5,thersttwocategoriesareI-conuent(andthereforemaintainablewithoutcoordination)be-causetheyonlyregulatethevisibilityofupdatestomultiplerecords.Becausethese(10of12)invariantsareI-conuentunderthework-loadtransactions,thereexistssomeexecutionstrategythatdoesnotusecoordination.However,simplybecausetheseinvariantsareI-conuentdoesnotmeanthatallexecutionstrategieswillscalewell:forexample,usinglockingwouldnotbecoordination-free.Asonecoordination-freeexecutionstrategy(whichweimplementinSection6.2)thatrespectstheforeignkeyandmaterializedviewinvariants,wecanuseRAMPtransactions,whichprovideatomi-callyvisibletransactionalupdatesacrossserverswithoutrelyingoncoordinationforcorrectness[11].Inbrief,RAMPtransactionsemploylimitedmulti-versioningandmetadatatoensurethatread-ersandwriterscanalwaysproceedconcurrently:anyclientwhosereadsoverlapwithanotherclient'swritestothesameitem(s)canusemetadatastoredintheitemstofetchany“missing”writesfromtherespectiveservers.AstandardRAMPtransactionoverdataitemssufcestoenforceforeignkeyconstraints,whileaRAMPtransac-tionovercommutativecountersasdescribedin[11]issufcienttoenforcetheTPC-Cmaterializedviewconstraints.TwoofTPC-C'sinvariantsarenotI-conuentwithrespecttotheworkloadtransactionsandthereforedorequirecoordination.Onaper-districtbasis,orderIDsshouldbeassignedsequentially(bothuniquelyandsequentially,intheNew-Ordertransaction)andordersshouldbeprocessedsequentially(intheDeliverytransaction).Ifthedatabaseispartitionedbywarehouse(asisstandard[52,54,57]),theformerisadistributedtransaction(bydefault,10%ofNew-Order transactionsspanmultiplewarehouses).Thebenchmarkspecica-tionallowsthelattertoberunasynchronouslyandinbatchmodeonaper-warehouse(non-distributed)basis,sowe,likeothers[54,57],focusonNew-Order.Includingadditionaltransactionsliketheread-onlyOrder-Statusintheworkloadmixwouldincreaseperformanceduetothetransactions'lackofdistributedcoordinationand(oftenconsiderably)smallerread/writefootprints.AvoidingNew-OrderCoordination.New-OrderisnotI-conuentwithrespecttotheTPC-Cinvariants,sowecanalwaysfallbacktousingserializableisolation.However,theper-districtIDassignmentrecords(10perwarehouse)wouldbecomeapointofcontention,lim-itingourthroughputtoeffectively100W RTTforaW-warehouseTPC-Cbenchmarkwiththeexpected10%distributedtransactions.Oth-ers[57](includingus,inpriorwork[9])havesuggesteddisregardingconsistencycriteria3.3.2.3and3.3.2.4,insteadoptingforuniquebutnon-sequentialIDassignment:thisallowsinconsistencyandviolatesthebenchmarkcompliancecriteria.Duringacompliantrun,New-Ordertransactionsmustcoordinate.However,asdiscussedabove,onlytheIDassignmentoperationisnon-I-conuent;theremainderoftheoperationsinthetransactioncanexecutecoordination-free.Withsomeeffort,wecanavoiddis-tributedcoordination.Anaïveimplementationmightgrabalockontheappropriatedistrict's“nextID”record,perform(possiblyremote)remainingreadsandwrites,thenreleasethelockatcommittime.Instead,asamoreefcientsolution,New-OrdercandeferIDassignmentuntilcommittimebyintroducingalayerofindi-rection.New-Ordertransactionscangenerateatemporary,unique,butnon-sequentialID(tmpID)andperformupdatesusingthisIDusingaRAMPtransaction(which,inturn,handlestheforeignkeyconstraints)[11].Immediatelypriortotransactioncommit,theNew-Ordertransactioncanassigna“real”IDbyatomicallyincrementingthecurrentdistrict's“nextID”record(yieldingrealID)andrecord-ingthe[tmpID,realID]mappinginaspecialIDlookuptable.AnyreadrequestsfortheIDcolumnoftheOrder,New-Order,orOrder-Linetablescanbesafelysatised(transparentlytotheenduser)byjoiningwiththeIDlookuptableontmpID.Ineffect,theNew-OrderIDassignmentcanuseanestedatomictransaction[44]uponcommit,andallcoordinationbetweenanytwotransactionsisconnedtoasingleserver.6.2EvaluatingTPC­CNew­OrderWesubsequentlyimplementedtheaboveexecutionstrategyinadistributeddatabaseprototypetoquantifytheoverheadsassociatedwithcoordinationinTPC-CNew-Order.Inbrief,thecoordination-avoidingqueryplanscaleslinearlytoover12.7Mtransactionspersecondon200serverswhilesubstantiallyoutperformingdistributedtwo-phaselocking.Ourgoalhereistodemonstrate—beyondthemicrobenchmarksofSection2—thatsafebutjudicioususeofcoor-dinationcanhavemeaningfulpositiveeffectonperformance.ImplementationandDeployment.Weemployamulti-versionedstoragemanager,withRAMP-Fasttransactionsforsnapshotreadsandatomicallyvisiblewrites/“merge”(providingavariantofregu-larregistersemantics,withwritesvisibletolatertransactionsaftercommit)[11]andimplementthenestedatomictransactionforIDassignmentasasub-procedureinsideRAMP-Fast'sserver-sidecom-mitprocedure(usingspinlocks).WeimplementtransactionsasstoredproceduresandfullltheTPC-C“IsolationRequirements”byusingreadandwritebufferingasproposedin[9].Asiscom-mon[35,46,52,54],wedisregardper-warehouseclientlimitsand“thinktime”toincreaseloadperwarehouse.Inall,ourbaseproto-typearchitectureissimilartothatof[11]:aJVM-basedpartitioned,main-memory,mastereddatabase. Figure5:TPC-CNew-Orderthroughputacrosseightservers.Foranapples-to-applescomparisonwithacoordination-intensivetechniquewithinthesamesystem,wealsoimplementedtextbooktwo-phaselocking(2PL)[15],whichprovidesserializabilitybutalsorequiresdistributedcoordination.Wetotallyorderlockrequestsacrossserverstoavoiddeadlock,batchinglockrequeststoeachserverandpiggybackingreadandwriterequestsonlockrequestRPC.Asavalidationofourimplementation,our2PLprototypeachievesper-warehouse(andsometimesaggregate)throughputsim-ilarto(andofteninexcessof)severalrecentserializabledatabaseimplementations(ofboth2PLandotherapproaches)[35,46,52,54].Bydefault,wedeployourprototypeoneightEC2cr1.8xlargeinstancesintheAmazonEC2us-west-2region(withnon-co-locatedclients)withonewarehouseperserver(recallthereare10“hot”districtIDrecordsperwarehouse)andreporttheaverageofthree120secondruns.Basicbehavior.Figure5showsperformanceacrossavarietyofcongurations,whichwedetailbelow.Overall,thecoordination-avoidingqueryplanfaroutperformstheserializableexecution.Thecoordination-avoidingqueryplanperformssomecoordination,but,becausecoordinationpointsarenotdistributed(unlike2PL),physi-calresources(andnotcoordination)arethebottleneck.Varyingload.Asweincreasethenumberofclients,thecoordination-avoidingqueryplanthroughputincreaseslinearly,while2PLthrough-putincreasesto40Ktransactionspersecond,thenlevelsoff.AsinourmicrobenchmarksinSection2,theformerutilizesavailablehard-wareresources(bottleneckingonCPUcyclesat640Ktransactionspersecond),whilethelatterbottlenecksonlogicalcontention.Physicalresourceconsumption.Tounderstandtheoverheadsofeachcomponentinthecoordination-avoidingqueryplan,weusedJVMprolingtoolstosamplethreadexecutionwhilerunningatpeakthroughput,attributingtimespentinfunctionstorelevantmoduleswithinthedatabaseimplementation(wherepossible): CodePathCyclesStorageManager(Insert,Update,Read)45.3%StoredProcedureExecution14.4%RPCandNetworking13.2%Serialization12.6%IDAssignmentSynchronization(spinlockcontention)0.19%Other14.3% Thecoordination-avoidingprototypespendsalargeportionofex-ecutioninthestoragemanager,performingB-treemodicationsandlookupsandresultsetcreation,andinRPC/serialization.Incontrastto2PL,theprototypespendslessthan0:2%oftimecoordinating,intheformofwaitingforlocksintheNew-OrderIDassignment;the(single-site)assignmentisfast(alinearizableintegerincrementandstore,followedbyawriteandfenceinstructiononthespinlock),sothisshouldnotbesurprising.Weobservedlargethroughputpenaltiesduetogarbagecollection(GC)overheads(upto40%)—anunfortunatecostofourhighlycompact(severalthousandlinesofScala),JVM-basedimplementation.However,eveninthiscurrentprototype,physicalresourcesarethebottleneck—notcoordination.Varyingcontention.Wesubsequentlyvariedthenumberof“hot,”orcontendeditemsbyincreasingthenumberofwarehousesoneachserver.Unsurprisingly,2PLbenetsfromadecreasedcontention,risingtoover87Ktransactionspersecondwith64warehouses.Incontrast,ourcoordination-avoidingimplementationislargelyunaffected(and,at64warehouses,isevennegativelyimpactedbyincreasedGCpressure).Thecoordination-avoidingqueryplaniseffectivelyagnostictoread/writecontention.Varyingdistribution.Wealsovariedthepercentageofdistributedtransactions.Thecoordination-avoidingqueryplanincurreda29%overheadmovingfromnodistributedtransactionstoalldistributedtransactionsduetoincreasedserializationoverheadsandlessef-cientbatchingofRPCs.However,the2PLimplementationde-creasedinthroughputbyover90%(inlinewithpriorresults[46,54],albeitexaggeratedhereduetohighercontention)asmorerequestsstalledduetocoordinationwithremoteservers.Scalingout.Finally,weexaminedourprototype'sscalability,againdeployingonewarehouseperserver.AsFigure6demonstrates,ourprototypescaleslinearly,toover12.74milliontransactionspersec-ondon200servers(inlightofourearlierresults,and,foreconomicreasons,wedonotrun2PLatthisscale).Per-serverthroughputislargelyconstantafter100servers,atwhichpointourdeploymentspannedallthreeus-west-2datacentersandexperiencedslightlydegradedper-serverperformance.Whilewemakeuseofapplicationsemantics,weareunawareofanyothercompliantmulti-serverTPC-Cimplementationthathasachievedgreaterthan500KNew-Ordertransactionspersecond[35,46,52,54].Summary.WepresentthesequantitativeresultsasaproofofconceptthatexecutingevenchallengingworkloadslikeTPC-Cthatcontaincomplexintegrityconstraintsarenotnecessarilyatoddswithscalabilityifimplementedinacoordination-avoidingmanner.Distributedcoordinationneednotbeabottleneckforallapplica-tions,evenifconictserializableexecutionindicatesotherwise.Coordinationavoidanceensuresthatphysicalresources—andnotlogicalcontention—arethesystembottleneckwheneverpossible.6.3AnalyzingAdditionalApplicationsTheseresultsbegintoquantifytheeffectsofcoordination-avoidingconcurrencycontrol.Ifconsideringapplication-levelinvariants,databasesonlyhavetopaythepriceofcoordinationwhennecessary.Weweresurprisedthatthe“currentindustrystandardforevaluat-ingtheperformanceofOLTPsystems”[23]wassoamenableto Figure6:Coordination-avoidingNew-Orderscalability.coordination-avoidingexecution—atleastforcompliantexecutionasdenedbytheofcialTPC-Cspecication.Forgreatervariety,wealsostudiedtheworkloadsoftherecentlyassembledOLTP-Benchsuite[23],performingasimilaranalysistothatofSection6.1.Wefound(andconrmedwithanauthorof[23])thatfornineoffourteenremaining(non-TPC-C)OLTP-Benchapplications,theworkloadtransactionsdidnotinvolvein-tegrityconstraints(e.g.,didnotmodifyprimarykeycolumns),one(CH-benCHmark)matchedTPC-C,andtwospecicationsimplied(butdidnotexplicitlystate)arequirementforuniqueIDassign-ment(AuctionMark'snew-purchaseordercompletion,SEATS'sNewReservationseatbooking;achievablelikeTPC-CorderIDs).Theremainingtwobenchmarks,sibenchandsmallbankwerespecicallydesigned(byanauthorofthispaper)asresearchbench-marksforserializableisolation.Finally,thethree“consistencyconditions”requiredbythenewerTPC-EbenchmarkareapropersubsetofthetwelveconditionsfromTPC-Cconsideredhere(andareallmaterializedcounters).Itispossible(evenlikely)thatthesebenchmarksareunderspecied,butaccordingtoofcialspecica-tions,TPC-Ccontainsthemostcoordination-intensiveinvariantsamongallbuttwooftheOLTP-Benchworkloads.Anecdotally,ourconversationsandexperienceswithreal-worldapplicationprogrammersanddatabasedevelopershavenotidenti-edinvariantsthatareradicallydifferentthanthosewehavestudiedhere.Asimplethoughtexperimentidentifyingtheinvariantsre-quiredforasocialnetworkingsiteyieldsanumberofinvariantsbutnonethatareparticularlyexotic(e.g.,usernameuniqueness,foreignkeyconstraintsbetweenupdates,privacysettings[11,20]).Nonetheless,weviewthefurtherstudyofreal-worldinvariantstobeanecessaryareaforfutureinvestigation.Intheinterim,theseprelim-inaryresultshintatwhatispossiblewithcoordination-avoidanceaswellasthecostsofcoordinationifapplicationsarenotI-conuent.7.RELATEDWORKDatabasesystemdesignershavelongsoughttomanagethetrade-offbetweenconsistencyandcoordination.Aswehavediscussed,se-rializabilityanditsmanyimplementations(includinglock-based,op-timistic,andpre-schedulingmechanisms)[15,16,25,30,52–54,57]aresufcientformaintainingapplicationcorrectness.However,serializabilityisnotalwaysnecessary:asdiscussedinSection1,se-rializabledatabasesdonotallowcertainexecutionsthatarecorrectaccordingtoapplicationsemantics.Thishasledtoalargeclassofapplication-level—orsemantic—concurrencycontrolmodelsand mechanismsthatadmitgreaterconcurrency.Thereareseveralsur-veysonthistopic,suchas[29,53],and,inoursolution,weintegratemanyconceptsfromthisliterature.Commutativity.Oneofthemostpopularalternativestoserializ-abilityistoexploitcommutativity:iftransactionreturnvalues(e.g.,ofreads)and/ornaldatabasestatesareequivalentdespitereorder-ing,theycanbeexecutedsimultaneously[18,41,58].Commutativityisoftensufcientforcorrectnessbutisnotnecessary.Forexample,ifananalystatawholesalercreatesareportondailycashows,anyconcurrentsaletransactionswillnotcommutewiththereport(theresultswillchangedependingonwhetherthesalecompletesbeforeoraftertheanalystrunsherqueries).However,thereportcreationisI-conuentwithrespectto,say,theinvariantthateverysaleinthereportreferencesacustomerfromthecustomerstable.[18,39]provideadditionalexamplesofsafenon-commutativity.MonotonicityandConvergence.TheCALMTheorem[7]showsthatmonotoneprogramsexhibitdeterministicoutcomesdespitere-ordering.CRDTobjects[50]similarlyensureconvergentoutcomesthatreectallupdatesmadetoeachobject.Theseoutcomedetermin-ismandconvergenceguaranteesareusefullivenessproperties[49](e.g.,aconvergedCRDTOR-Setreectsallconcurrentadditionsandremovals)butdonotpreventusersfromobservinginconsistentdata[40],orsafety(e.g.,theCRDTOR-Setdoesnot—byitself—enforceinvariants,suchasensuringthatnoemployeebelongstotwodepartments),andarethereforenotsufcienttoguaranteecor-rectnessforallapplications.FurtherunderstandingtherelationshipbetweenI-conuenceandCALMisaninterestingareaforfurtherexploration(e.g.,asI-conuenceaddssafetytoconuence,isthereanaturalextensionofmonotonelogicthatincorporatesI-conuentinvariants—say,viaan“invariant-scoped”formofmonotonicity?).UseofInvariants.Alargenumberofdatabasedesigns—including,inrestrictedforms,manycommercialdatabasestoday—usevari-ousformsofapplication-suppliedinvariants,constraints,orothersemanticdescriptionsofvaliddatabasestatesasaspecicationforapplicationcorrectness(e.g.,[14,21,26,29,32,33,37,40–42,47]).Wedrawinspirationand,inparticular,ouruseofinvariantsfromthispriorwork.However,wearenotawareofrelatedworkthatdis-cusseswhencoordinationisstrictlyrequiredtoenforceagivensetofinvariants.Moreover,ourpracticalfocushereisprimarilyorientedtowardsinvariantsfoundinSQLandfrommodernapplications.Inthiswork,weprovideanecessaryandsufcientconditionforsafe,coordination-freeexecution.Incontrastwithmanyoftheconditionsabove(esp.commutativityandmonotonicity),weexplic-itlyrequiremoreinformationfromtheapplicationintheformofinvariants(KungandPapadimitriou[38]suggestthisisinformationisrequiredforgeneral-purposenon-serializableyetsafeexecution.)Wheninvariantsareunavailable,manyofthesemoreconservativeapproachesmaystillbeapplicable.Ouruseofanalysis-as-design-toolisinspiredbythisliterature—inparticular,[18].Coordinationcosts.Inthiswork,wedeterminewhentransactionscanrunentirelyconcurrentlyandwithoutcoordination.Incontrast,alargenumberofalternativemodels(e.g.,[4,8,26,33,37,42,43])assumeserializableorlinearizable(andthereforecoordinated)up-datestosharedstate.Theseassumptionsarestandard(butnotuniversal[17])intheconcurrentprogrammingliterature[8,49].(Ad-ditionally,unlikemuchofthisliterature,weonlyconsiderasinglesetofinvariantsperdatabaseratherthanper-operationinvariants.)Forexample,transactionchopping[51]andlaterapplication-awareextensions[3,14]decomposetransactionsintoasetofsmallertrans-actions,providingincreasedconcurrency,butinturnrequirethatindividualtransactionsexecuteinaserializable(orstrictserializ-able)manner.Thisrelianceoncoordinatedupdatesisatoddswithourgoalofcoordination-freeexecution.However,thesealternativetechniquesareusefulinreducingthedurationanddistributionofcoordinationonceitisestablishedthatcoordinationisrequired.Termrewriting.Intermrewritingsystems,I-conuenceguaran-teesthatarbitraryruleapplicationwillnotviolateagiveninvari-ant[24],generalizingChurch-Rosserconuence[36].Weadaptthisconceptandeffectivelytreattransactionsasrewriterules,databasestatesasconstraintstates,andthedatabasemergeoperatorasaspecialjoinoperator(intheterm-rewritingsense)denedforallstates.Rewritingsystemconcepts—includingconuence[4]—havepreviouslybeenintegratedintoactivedatabasesystems[59](e.g.,intriggers,ruleprocessing),butwearenotfamiliarwithaconceptanalogoustoI-conuenceintheexistingdatabaseliterature.Coordination-freealgorithmsandsemantics.Ourworkisinu-encedbythedistributedsystemsliterature,wherecoordination-freeexecutionacrossreplicasofagivendataitemhasbeencapturedas“availability”[12,28].Alargeclassofsystemsprovidesavail-abilityvia“optimisticreplication”(i.e.,performoperationslocally,thenreplicate)[48].We—likeothers[17]—adopttheuseofthemergeoperatortoreconciledivergentdatabasestates[45]fromthisliterature.Bothtraditionaldatabasesystems[2]andmorerecentproposals[40,41]allowthesimultaneoususeof“weak”and“strong”isolation;weseektounderstandwhenstrongmechanismsareneededratherthananoptimalimplementationofeither.Unlike“tentativeupdate”models[27],wedonotrequireprogrammerstospecifycompensatoryactions(beyondmerge,whichweexpecttotypicallybegenericand/orsystem-supplied)anddonotreversetransactioncommitdecisions.CompensatoryactionscouldbecapturedunderI-conuenceasaspecializedmergeprocedure.TheCAPTheorem[1,28]recentlypopularizedthetensionbe-tweenstrongsemanticsandcoordinationandpertainstoaspecicmodel(linearizability).Therelationshipbetweenserializabilityandcoordinationrequirementshasalsobeenwelldocumentedinthedatabaseliterature[21].Werecentlyclassiedarangeofweakeriso-lationmodelsbyavailability,labelingsemanticsachievablewithoutcoordinationas“HighlyAvailableTransactions”[9].Ourresearchhereaddresseswhenparticularapplicationsrequirecoordination.Inourevaluation,wemakeuseofourrecentRAMPtransactional-gorithms[11],whichguaranteecoordination-free,atomicallyvisibleupdates.RAMPtransactionsareanimplementationofI-conuentsemantics(i.e.,ReadAtomicisolation,usedinourimplementationforforeignkeyconstraintmaintenance).OurfocusinthispaperiswhenRAMPtransactions(andanyothercoordination-freeorI-conuentsemantics)areappropriateforapplications.Summary.TheI-conuencepropertyisanecessaryandsufcientconditionforsafe,coordination-freeexecution.Sufcientcondi-tionssuchascommutativityandmonotonicityareusefulinreducingcoordinationoverheadsbutarenotalwaysnecessary.Here,weexplorethefundamentallimitsofcoordination-freeexecution.Todoso,weexplicitlyconsideramodelwithoutsynchronouscommu-nication.Thisiskeytoscalability:if,bydefault,operationsmustcontactacentralizedvalidationservice,performatomicupdatestosharedstate,orotherwisecommunicate,thenscalabilitywillbecompromised.Finally,weonlyconsiderasinglesetofinvariantsfortheentireapplication,reducingprogrammeroverheadwithoutaffectingourI-conuenceresults.8.CONCLUSIONACIDtransactionsandassociatedstrongisolationlevelsdomi-natedtheeldofdatabaseconcurrencycontrolfordecades,duein largeparttotheireaseofuseandabilitytoautomaticallyguaranteeapplicationcorrectnesscriteria.However,thispowerfulabstractioncomeswithaheftycost:concurrenttransactionsmustcoordinateinordertopreventread/writeconictsthatcouldcompromiseequiv-alencetoaserialexecution.Atlargescaleand,increasingly,ingeo-replicatedsystemdeployments,thecoordinationcostsneces-sarilyassociatedwiththeseimplementationsproducesignicantoverheadsintheformofpenaltiestothroughput,latency,andavail-ability.Inlightofthesetrends,wedevelopedaformalframework,calledinvariantconuence,inwhichapplicationinvariantsareusedasabasisfordeterminingifandwhencoordinationisstrictlyneces-sarytomaintaincorrectness.Withthisframework,wedemonstratedthat,infact,many—butnotall—commondatabaseinvariantsandin-tegrityconstraintsareactuallyachievablewithoutcoordination.Byapplyingtheseresultstoarangeofactualtransactionalworkloads,wedemonstratedanopportunitytoavoidcoordinationinmanycasesthattraditionalserializablemechanismswouldotherwiseco-ordinate.Theorder-of-magnitudeperformanceimprovementswedemonstratedviacoordination-avoidingconcurrencycontrolstrate-giesprovidecompellingevidencethatinvariant-basedcoordinationavoidanceisapromisingapproachtomeaningfullyscalingfuturedatamanagementsystems.Acknowledgments.TheauthorswouldliketothankPeterAlvaro,NeilConway,ShelFinkelstein,andJoshRosenforhelpfulfeedbackonearlierversionsofthiswork,DanCrankshaw,JoeyGonzalez,NickLanham,andGenePangforvariousengineeringcontributions,andYunjingYuforsharingtheBobtaildataset.ThisresearchissupportedinpartbyNSFCISEExpeditionsAwardCCF-1139158,LBNLAward7076018,DARPAXDataAwardFA8750-12-2-0331,theNSFGraduateResearchFellowship(grantDGE-1106400),andgiftsfromAmazonWebServices,Google,SAP,theThomasandStaceySiebelFoundation,Adobe,Apple,Inc.,Bosch,C3Energy,Cisco,Cloudera,EMC,Ericsson,Facebook,GameOnTalis,Guavus,HP,Huawei,Intel,Microsoft,NetApp,Pivotal,Splunk,Virdata,VMware,andYahoo!.9.REFERENCES[1]D.J.Abadi.Consistencytradeoffsinmoderndistributeddatabasesystemdesign:CAPisonlypartofthestory.IEEEComputer,45(2):37–42,2012.[2]A.Adya.Weakconsistency:ageneralizedtheoryandoptimisticimplementationsfordistributedtransactions.PhDthesis,MIT,1999.[3]D.Agrawaletal.Consistencyandorderability:semantics-basedcorrectnesscriteriafordatabases.ACMTODS,18(3):460–486,Sept.1993.[4]A.Aiken,J.Widom,andJ.M.Hellerstein.Behaviorofdatabaseproductionrules:Termination,conuence,andobservabledeterminism.InSIGMOD1992.[5]P.Alvaro,N.Conway,J.M.Hellerstein,andW.Marczak.ConsistencyanalysisinBloom:aCALMandcollectedapproach.InCIDR2011.[6]P.Alvaroetal.Consistencywithoutborders.InSoCC2013.[7]T.J.Ameloot,F.Neven,andJ.VanDenBussche.Relationaltransducersfordeclarativenetworking.J.ACM,60(2):15:1–15:38,May2013.[8]H.Attiya,R.Guerraoui,D.Hendler,etal.Lawsoforder:Expensivesynchronizationinconcurrentalgorithmscannotbeeliminated.InPOPL2011.[9]P.Bailis,A.Davidson,A.Fekete,A.Ghodsi,J.M.Hellerstein,andI.Stoica.HighlyAvailableTransactions:Virtuesandlimitations.InVLDB2014.[10]P.Bailis,A.Fekete,M.J.Franklin,A.Ghodsi,etal.Coordinationavoidanceindatabasesystems(Extendedversion).2014.arXiv:1402.2237.[11]P.Bailis,A.Fekete,A.Ghodsi,J.M.Hellerstein,andI.Stoica.ScalableatomicvisibilitywithRAMPtransactions.InSIGMOD2014.[12]P.BailisandA.Ghodsi.EventualConsistencytoday:Limitations,extensions,andbeyond.ACMQueue,11(3),2013.[13]P.BailisandK.Kingsbury.Thenetworkisreliable:Aninformalsurveyofreal-worldcommunicationsfailures.ACMQueue,12(7):20,2014.[14]A.J.BernsteinandP.M.Lewis.Transactiondecompositionusingtransactionsemantics.DistributedandParallelDatabases,4(1):25–47,1996.[15]P.Bernstein,V.Hadzilacos,andN.Goodman.Concurrencycontrolandrecoveryindatabasesystems.Addison-wesleyNewYork,1987.[16]P.A.Bernstein,D.W.Shipman,andJ.B.Rothnie,Jr.Concurrencycontrolinasystemfordistributeddatabases(SDD-1).ACMTODS,5(1):18–51,Mar.1980.[17]S.Burckhardt,D.Leijen,M.Fähndrich,andM.Sagiv.Eventuallyconsistenttransactions.InESOP.2012.[18]A.T.Clementsetal.Thescalablecommutativityrule:designingscalablesoftwareformulticoreprocessors.InSOSP2013.[19]N.Conwayetal.Logicandlatticesfordistributedprogramming.InSoCC2012.[20]B.F.Cooper,R.Ramakrishnan,U.Srivastava,A.Silberstein,P.Bohannon,etal.PNUTS:Yahoo!'shosteddataservingplatform.InVLDB2008.[21]S.Davidson,H.Garcia-Molina,andD.Skeen.Consistencyinpartitionednetworks.ACMComputingSurveys,17(3):341–370,1985.[22]G.DeCandia,D.Hastorun,M.Jampani,G.Kakulapati,A.Lakshman,etal.Dynamo:Amazon'shighlyavailablekey-valuestore.InSOSP2007.[23]D.E.Difallah,A.Pavlo,C.Curino,andP.Cudre-Mauroux.OLTP-Bench:Anextensibletestbedforbenchmarkingrelationaldatabases.InVLDB2014.[24]G.Duck,P.Stuckey,andM.Sulzmann.Observableconuenceforconstrainthandlingrules.InICLP2007.[25]K.P.Eswaranetal.Thenotionsofconsistencyandpredicatelocksinadatabasesystem.Commun.ACM,19(11):624–633,1976.[26]H.Garcia-Molina.Usingsemanticknowledgefortransactionprocessinginadistributeddatabase.ACMTODS,8(2):186–213,June1983.[27]H.Garcia-MolinaandK.Salem.Sagas.InSIGMOD1987.[28]S.GilbertandN.Lynch.Brewer'sconjectureandthefeasibilityofconsistent,available,partition-tolerantwebservices.SIGACTNews,33(2):51–59,2002.[29]P.Godfreyetal.Logicsfordatabasesandinformationsystems,chapterIntegrityconstraints:Semanticsandapplications,pages265–306.Springer,1998.[30]J.Gray.Thetransactionconcept:Virtuesandlimitations.InVLDB1981.[31]J.GrayandL.Lamport.Consensusontransactioncommit.ACMTODS,31(1):133–160,Mar.2006.[32]P.W.GrefenandP.M.Apers.Integritycontrolinrelationaldatabasesystems–anoverview.Data&KnowledgeEngineering,10(2):187–223,1993.[33]A.GuptaandJ.Widom.Localvericationofglobalintegrityconstraintsindistributeddatabases.InSIGMOD1993,pages49–58.[34]R.Johnson,I.Pandis,andA.Ailamaki.Eliminatingunscalablecommunicationintransactionprocessing.TheVLDBJournal,pages1–23,2013.[35]E.P.Jones,D.J.Abadi,andS.Madden.Lowoverheadconcurrencycontrolforpartitionedmainmemorydatabases.InSIGMOD2010.[36]J.W.Klop.Termrewritingsystems.StichtingMathematischCentrumAmsterdam,1990.[37]H.K.KorthandG.Speegle.Formalmodelofcorrectnesswithoutserializabilty.InSIGMOD1988.[38]H.-T.KungandC.H.Papadimitriou.Anoptimalitytheoryofconcurrencycontrolfordatabases.InSIGMOD,1979.[39]L.Lamport.Towardsatheoryofcorrectnessformulti-userdatabasesystems.Technicalreport,CCA,1976.Describedin[3,49].[40]C.Li,J.Leitao,A.Clement,N.Preguiça,R.Rodrigues,etal.Automatingthechoiceofconsistencylevelsinreplicatedsystems.InUSENIXATC2014.[41]C.Li,D.Porto,A.Clement,J.Gehrke,etal.Makinggeo-replicatedsystemsfastaspossible,consistentwhennecessary.InOSDI2012.[42]Y.Lin,B.Kemme,R.Jiménez-Peris,etal.Snapshotisolationandintegrityconstraintsinreplicateddatabases.ACMTODS,34(2),July2009.[43]S.Lu,A.Bernstein,andP.Lewis.Correctexecutionoftransactionsatdifferentisolationlevels.IEEETKDE,16(9),2004.[44]N.A.Lynch,M.Merritt,W.Weihl,andA.Fekete.AtomicTransactions:InConcurrentandDistributedSystems.MorganKaufmannPublishersInc.,1993.[45]K.Petersen,M.J.Spreitzer,D.B.Terry,M.M.Theimer,andA.J.Demers.Flexibleupdatepropagationforweaklyconsistentreplication.InSOSP1997.[46]K.Ren,A.Thomson,andD.J.Abadi.Lightweightlockingformainmemorydatabasesystems.VLDB2013.[47]S.Roy,L.Kot,etal.Writesthatfallintheforestandmakenosound:Semantics-basedadaptivedataconsistency,2014.arXiv:1403.2307.[48]Y.SaitoandM.Shapiro.Optimisticreplication.ACMCSUR,37(1),Mar.2005.[49]F.B.Schneider.Onconcurrentprogramming.Springer,1997.[50]M.Shapiroetal.Acomprehensivestudyofconvergentandcommutativereplicateddatatypes.TechnicalReport7506,INRIA,2011.[51]D.Shasha,F.Llirbat,E.Simon,andP.Valduriez.Transactionchopping:algorithmsandperformancestudies.ACMTODS,20(3):325–363,Sept.1995.[52]M.Stonebraker,S.Madden,D.J.Abadi,S.Harizopoulos,etal.Theendofanarchitecturalera:(it'stimeforacompleterewrite).InVLDB2007.[53]M.TamerÖzsuandP.Valduriez.Principlesofdistributeddatabasesystems.Springer,2011.[54]A.Thomson,T.Diamond,S.Weng,K.Ren,P.Shao,andD.Abadi.Calvin:Fastdistributedtransactionsforpartitioneddatabasesystems.InSIGMOD2012.[55]TPCCouncil.TPCBenchmarkCrevision5.11,2010.[56]I.L.Traiger,J.Gray,C.A.Galtieri,andB.G.Lindsay.Transactionsandconsistencyindistributeddatabasesystems.ACMTODS,7(3):323–342,1982.[57]S.Tu,W.Zheng,E.Kohler,B.Liskov,andS.Madden.Speedytransactionsinmulticorein-memorydatabases.InSOSP2013.[58]W.Weihl.Specicationandimplementationofatomicdatatypes.PhDthesis,MassachusettsInstituteofTechnology,1984.[59]J.WidomandS.Ceri.Activedatabasesystems:Triggersandrulesforadvanceddatabaseprocessing.MorganKaufmann,1996.[60]Y.Xuetal.Bobtail:avoidinglongtailsinthecloud.InNSDI2013.