/
Consistency Without Borders Peter Alvaro Peter Bailis Neil Conway Joseph M Consistency Without Borders Peter Alvaro Peter Bailis Neil Conway Joseph M

Consistency Without Borders Peter Alvaro Peter Bailis Neil Conway Joseph M - PDF document

alexa-scheidler
alexa-scheidler . @alexa-scheidler
Follow
585 views
Uploaded On 2014-12-26

Consistency Without Borders Peter Alvaro Peter Bailis Neil Conway Joseph M - PPT Presentation

Hellerstein Abstract Distributed consistency is a perennial research topic in recent years it has become an urgent practical matter as well The research literature has focused on enforcing various 64258avors of consistency at the IO layer such as li ID: 29842

Hellerstein Abstract Distributed consistency

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Consistency Without Borders Peter Alvaro..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

andlanguagelevels—withvarioustradeoffsbetweenef-ciency,generality,andengineeringcomplexity.Inthispaper,wemotivatethefurther(and,insomecases,renewed)studyofthesealternativeapproachestodistributedconsistency.Weofferaninformaltaxon-omyofstrategiesandassociatedinsightsbothfromourownworkaswellasrecentdevelopmentsfromotherresearchers.Weidentifyopportunitiesforfurtherexplo-rationandhighlightseveralareasinwhichnon-I/O-levelmechanismshavealreadybeguntosucceedinthewild.Asacommunity,wehaveanopportunitytodemonstratethatcorrectnessatscaleisnotinconictwithavailability,performance,andprogrammerproductivity.2CaseStudyToillustratehowconsistentoutcomescanbeachievedatseveraldifferentplacesinthesoftwarestack,considerascenarioinwhichseveralprogramsmanipulateadirectedgraph.Thissystemcanbedividedinto(atleast)twotiers:thestoragetier(e.g.,aDBMSorkey-valuestore)managesthepersistenceofthegraphdatastructure,whiletheapplicationtieraccessesthegraphbysubmittingreadandwriteoperationsagainstthegraphstore.Toimprovefaulttoleranceandscalability,weassumethegraphisreplicatedandpartitioned.Weconsidertwoapplicationsthatusethisgraphstore:1.Thedeadlockdetectorqueriesa“waits-for”graphthatrecordsdependenciesbetweenprocesses.Thetaskistocheckwhetherthegraphcontainsacycle,whichindicatesadeadlock[21].2.Thegarbagecollectorusesa“refers-to”graphtorecordreferencesbetweenacollectionofdistributedobjects.Theobjectiveistodetectstronglyconnectedcomponentsthatarenotreachablefromadistin-guished“root”object;suchcomponentscansafelybereclaimed[1].Bothprogramshavesimilarcorrectnessrequirements.Forthedeadlockdetector,alldeadlocksshouldeventu-allybereported,withnofalsepositives.Similarly,thegarbagecollectorshouldensurethatallunreachablecom-ponentsareeventuallydiscovered,andno“live”objectsarereturnedassuitableforgarbagecollection.Howshouldwemaptheseapplication-levelsemanticsdowntothelow-levelstorageabstraction?Intheremain-derofthepaper,wewillconsiderthesesemanticsatthetraditionalextremes(Section3),andviaconvergentob-jects(Section4),distributeddataows(Section5),andwholeprogramanalysis(Section6).Beforewedoso,wenotethatneitherexampleappli-cationrequiresastrongconsistencyguaranteesuchaslinearizabilityorserializabilitytomaintaincorrectness.Bothdeadlockandunreferencedmemoryarestableprop-erties[20]:oncesuchapropertyholds,itwillpersist(untilacorrectiveactionistaken,suchasabortingoneoftheparticipantsinadeadlock).However,deadlockisalsoastrongstableproperty[52]:itcanbedetectedgivenasubsetoftheglobalgraph.Thisimpliesthatthedeadlockdetectoronlyrequiresveryweaksemanticguaranteesfromthegraphstore:aslongasallwaits-foredgesareeventuallyobserved(regardlessoforder),alldeadlockedtransactionswillbedetected.Incontrast,thegarbagecollectorrequiresglobalknowledge:justbecauseonepartitionofthegraphstorecontainsnoreferencestoanobjectdoesnotimplytherearenoreferencesglobally.Hence,garbagecollectionrequiresstrongerconsistencyguaranteesthandeadlockdetection.However,neitherre-quires“strong”consistency—anditsconcomitantcostsindecreasedavailabilityandincreasedlatency—toachievecorrectbehavior.3ConsistencyattheExtremesToguaranteethatapplication-levelinvariantsarenevervi-olated,programmersareoftenforcedtochoosebetweenoneoftwo“extreme”strategies:genericI/O-levelinter-facesthatcontroltheorderofeventssuchasmessagesorreadsandwrites,andcustom,typicallyadhocsolutionsthatforceapplicationlogictoassumeallresponsibilityforensuringthatcorrectnessinvariantsarepreserved.Bothapproacheshavesignicantlimitations.3.1I/O-LevelConsistencyDatabasesystemshavelongprovidedguaranteesabouttheinterleavingof“conicting”operationsonsharedstate[13].Theseguaranteesaredenedintermsofstorageoperationslikereadsandwrites:forexample,conictserializabilitydenesaconictastwoopera-tionsonthesamedataitemsubmittedbydifferenttrans-actions,inwhichatleastoneoftheoperationsisawrite[50].Althoughoriginallydenedforcentralizedsystems,theseconsistencymodelshavesubsequentlybeenappliedtodistributeddatamanagement[12].Awidevarietyofconsistencymodelshavebeenproposedthatmakedifferenttradeoffsbetweenlatency,availabil-ity,andthespaceofpermissibleoperationinterleavings(e.g.,[7,25,44,45,57]).Similarly,distributedsystemsoftenrelyonorderingguaranteesonmessagesthatref-erencesharedstate.Techniquessuchasstatemachinereplication[53]ensureconsistencyamongreplicasofadistributedservicebyguaranteeingthatmessagesaredeliveredinthesameordertoallreplicas.Groupcom-municationsystems[15]provideavarietyoforderingguaranteesforbroadcastmessages. sitionofaconuentreplicatedgraphstoreandacon-uentdeadlockdetectoryieldsaconuentcompositedataow,andallowsthesystemtoexecutewithoutsyn-chronization.Thegarbagecollectioncomponentwouldbeannotatedasnon-conuentbut—asiscommonprac-tice[41]—partitionedintogenerationsor“epochs.”Ifthedataowisenhancedtoproducesealingpunctuationsthatindicatewhenindividualallocatorswillproducenomoreedgeswithinagivenepoch,Blazescansynthesizeasimple,barrier-basedcoordinationstrategythatpreventsthegarbagecollectorfromexecutinguntilthegraphpar-titionissealed—thatis,themarkphasehasendedforagivenepoch.Thisstrategyismuchlessexpensivethanageneralcoordinationprotocol:ratherthanwaitingforco-ordinationoneverymessage,onlyasinglecoordinationroundisrequiredperepoch.Theprincipaldrawbackofthedataowapproachistheneedformanualcomponentannotations:annotat-ingmodulescanbeburdensomeanderror-prone,espe-ciallyforcomplexcomponents.Incorrectannotationscorrupttheanalysisandcanresultinunsafeoptimiza-tions.Forreusablemodules(liketheCRDTsdiscussedinSection4),itmaybepossibletohaveanexpertsupplyannotations.Thisamortizesthecostofannotationandreducestheriskoferrors,butisonlyapplicableforcom-monlyusedcomponents.Thisdrawbackaside,ow-levelapproachestoconsistencyoccupyaninterestingmiddleground:theyaremorebroadlyapplicablethanlanguage-orapplication-levelapproaches,andmorepowerfulthanobject-levelapproaches,whichcannotcapturecomposi-tionacrossservices.6Language-LevelConsistencyFlow-levelconsistencyonlyrequiresanabstractdataowgraphdepictingthesystemarchitecture,andhencecanbeusedwithexistingprogramsandoff-the-shelfstreamprocessorssuchasStorm[43].However,italsorequiresthatusersmanuallyaddsemanticannotations,whichisburdensomeanderror-prone.Theseconcernsareexac-erbatedasthecomplexityofthesystemincreases.Inthissection,weconsideramoreradicalapproach:iftheentiresystemiswritteninahigh-levellanguagethatdi-rectlyencodesbothdependenciesandappropriateseman-ticproperties,thecompilercanautomaticallyanalyzetheconsistencypropertiesofentireapplications.6.1DependencyAnalysisDatabasesystemsareaprominentexampleofthepowerofautomaticdependencyanalysis.Becausealldatahasauniformrepresentation(relations)anddeclarativerulesareusedtocomputederiveddata(e.g.,views),thesys-temcaneasilyobservehowbasedataisusedtocomputederiveddata.Thisallowspowerfulcapabilitieslikeau-tomaticmaterializedviewmaintenance[35],constraintinference[17,46],andprovenanceanalysis[22].Toenablesimilarlypowerfullineageanalysisforlarge-scaledistributedsystems,severaltechnicalchallengesmustbeaddressed.First,weneedauniformrepresenta-tionforallsystemstate,includingprocess-localknowl-edge,systemeventsliketimersandinterrupts,andnet-workmessages.Second,weneedanotionofdependen-ciesthataccountsforbothsynchronous,process-localde-pendencies(localcomputation)andasynchronous,cross-processdependencies(communication).Wecallthecom-binationoftheseideasdata-centricprogramming[2]:allsystemstateisrepresentedinauniformmanner(asre-lations),whichenablesthesystemlogictobewrittenasdeclarativequeriesoverthatstate.Anextendedlanguagethatadmitsasynchronousqueriescancapturecommuni-cationwithinthesamedeclarativeframework[5].Themostrecentdata-centriclanguagedesignedbyourgroupiscalledBloom[4,16].6.2SemanticsDependencyanalysisrevealshowinputs,outputs,andintermediatestatearerelated;inaddition,weneedknowl-edgeofsemantics—thatis,howthesedatavalueschangeovertimeandwhichinvariantsarepreserved.Semanticpropertiesandcoordinationrequirementsarecloselyre-lated:ifaprogram'ssemanticsallowasituationinwhichacorrectnessinvariantmightbeviolated,thenwemightuseacoordinationprotocoltopreventsuchascenariofromarising.Animportantsemanticpropertyismonotonicity:in-tuitively,amonotonicoperatorisonethatprocessesitsinputsinanorder-insensitivemannerandneverretractsapreviousoutputinthefaceofnewinformation.Typi-calexamplesofmonotonicoperatorsincludesetunion,join,projection,andselection[4],aswellasCRDT-likelatticeswithalgebraiccomposition[24].TheCALMThe-oremstatesthat,ifaprogramcanbeexpressedentirelyusingmonotoniclogic,itisguaranteedtobeconuent—thatis,deterministic—despitetheeffectsofnetworknon-determinism[6,38].Hence,monotonicoperationsforma“safe”vocabularyfordistributedprogramming:becausetheprogram'soutputisadeterministicfunctionofitsin-put,itismucheasiertocheckthatcorrectnessinvariantsarepreserved.Fordata-centriclanguagessuchasBloom,thereisasimpleconservativetesttodeterminethemonotonicityofindividualrulesorentireprograms—essentially,mono-tonicityispartofthelanguage'stypesystem[4].Becausemonotonicityimpliesconuence,thistestcanidentifyaprogram'sconsistencyrequirements.Forexample,con- ingwhetheragivenapplicationcanproduceserializableoutcomeswhenrunatalowerisolationlevelhasbeenstudiedinthedatabaseliterature[31].Beyonddeterminism.Workoneventualconsistencyof-tentriestoguaranteedeterministicbehavior.Forexample,conuenceanalysisidentiesprogramfragmentsthatpro-ducedeterministicoutcomesdespitenon-deterministicnetworkbehavior.Similarly,CRDTsensurethatallrepli-casofanobjectconvergetothesamestate,regardlessofduplicatedorreorderedmessages.However,determinismistoostrongforsomecommonapplication-levelinvari-ants.Considerthesimpleinvariant:“Apurchaserequestreturnsaconrmationifinventoryisnon-zero;other-wiseitreturnsfailure.”Thisisnon-deterministic—thesetofsuccessfulpurchasesdependsontheorderinwhichmessagesaredeliveredandprocessed.Whatisthebestwaytoreasonaboutnon-deterministicbutwell-denedcorrectnesscriteria?Onestrategyistosimplyencodethespaceofacceptableoutcomesasadisjunction(e.g.,“PurchaseXsucceedsandYfailsORpurchaseXfailsandYsucceeds”).Aconuentsystemthatsatisesthisdisjunctionensuresthatanacceptableoutcomeisalwaysproduced.However,enumeratingthespaceofacceptableoutcomesscalespoorlyasapplicationcomplexitygrows.Isthereamorenaturalmodelthanthisenumeratedchoiceofoutcomes,and,ifso,canwebuildprogramanalysistoolstosupportit?Morefundamen-tally,beyondmonotonicity,aretheredesignpatternsthatassistinachievingsuch“controllednon-determinism,”andcansuchpatternsbecodiedintotheorems,analysistechniques,andlanguageconstructs?8ConclusionThedevelopmentofreliabledistributedapplicationsde-pendsuponprogrammers'abilitytoreasonaboutcon-sistency.BynarrowlyfocusingonI/O-levelconsistency,traditionalresearchinthisarearisksincreasingirrele-vance:asthelatencyandavailabilitycostsoftraditionalconsistencyprotocolshavebecomeprohibitiveatscale,developershavebeguntoavoidconsistencymechanismsentirely,insteadrelyingonadhoc,application-specicrulesforconictresolutionandreconciliation.Webe-lievethatthesolutionistomeetapplicationdevelopersontheirhometurf:toexploreavarietyofconsistencymechanisms,analysistools,andprogrammingconstructsthatoperateatdifferentlayersofthesoftwarestack.Thegoalshouldbetohelpprogrammersjudiciouslyemployconsistencyoftheappropriatestrengthandtoreasonaboutconsistencywhereveritismostnatural.Thecoretensionliesinbalancingexpressivityandefciencywithgeneralityandmodularity.Wehavesketchedexamplesandinsightsfromourexperiencestraddlingthesebound-aries,butwesuspectthatfurtherprogresswillrequiretheresearchcommunitytoreconsiderlong-heldassumptionsaboutsoftwarearchitectureandthedivisionbetweenstor-ageandapplicationlogic.AcknowledgmentsWewouldliketothankEmilyAndrews,AlexRasmussen,andtheanonymousreviewersfortheirhelpfulfeed-backonthispaper,andparticularlyourshepherd,PhilBernstein.ThisworkwassupportedbytheAirForceOfceofScienticResearch(grantFA95500810352),DARPAXDataAwardFA8750-12-2-0331,theNaturalSciencesandEngineeringResearchCouncilofCanada,theNationalScienceFoundation(grantsCNS-0722077,IIS-0713661,andIIS-0803690),NSFCISEExpeditionsawardCCF-1139158,theNationalScienceFoundationGraduateResearchFellowship(grantDGE-1106400),andgiftsfromAmazon,Cisco,ClearstoryData,Cloud-era,EMC,Ericsson,Facebook,FitWave,GeneralElectric,Google,Hortonworks,Intel,Microsoft,NetApp,NTT,Oracle,SAP,Samsung,Splunk,VMware,andYahoo!.References[1]S.E.AbdullahiandG.A.Ringwood.Garbagecol-lectingtheInternet:asurveyofdistributedgarbagecollection.ACMComputingSurveys,30(3):330–373,1998.[2]P.Alvaro,T.Condie,N.Conway,K.Elmeleegy,J.M.Hellerstein,andR.Sears.BOOMAnalytics:Exploringdata-centric,declarativeprogrammingforthecloud.InEuroSys,2010.[3]P.Alvaro,N.Conway,J.M.Hellerstein,andD.Maier.Blazes:coordinationanalysisfordis-tributedprograms.http://arxiv.org/abs/1309.3324,2013.Insubmission.[4]P.Alvaro,N.Conway,J.M.Hellerstein,andW.R.Marczak.ConsistencyanalysisinBloom:aCALMandcollectedapproach.InCIDR,2011.[5]P.Alvaro,W.R.Marczak,N.Conway,J.M.Heller-stein,D.Maier,andR.Sears.Dedalus:Data-logintimeandspace.InO.deMoor,G.Got-tlob,T.Furche,andA.Sellers,editors,DatalogReloaded,volume6702ofLectureNotesinCom-puterScience,pages262–281.SpringerBerlin/Heidelberg,2011. [58]P.A.Tucker,D.Maier,T.Sheard,andL.Fegaras.Exploitingpunctuationsemanticsincontinuousdatastreams.IEEETransactionsonKnowledgeandDataEngineering,15(3):555–568,2003.[59]W.Vogels.Eventuallyconsistent.CommunicationsoftheACM,52(1):40–44,2009.[60]W.E.Weihl.Commutativity-basedconcurrencycontrolforabstractdatatypes.IEEETransactionsonComputers,37(12):1488–1505,1988.