/
Blazes Coordination Analysis for Distributed Programs Blazes Coordination Analysis for Distributed Programs

Blazes Coordination Analysis for Distributed Programs - PDF document

natalia-silvester
natalia-silvester . @natalia-silvester
Follow
429 views
Uploaded On 2015-05-06

Blazes Coordination Analysis for Distributed Programs - PPT Presentation

Hellerstein 3 David Maier UC Berkeley palvarocsberkeleyedu nrccsberkeleyedu hellersteincsberkeleyedu Portland State University maiercspdxedu Abstract Distributed consistency is perhaps the most dis cussed topic in distributed systems today Coordina ID: 61988

Hellerstein David

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Blazes Coordination Analysis for Distrib..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

efcientandmanageableprotocolofasynchronouspoint-to-pointcommunicationbetweenproducersandconsumers—calledsealing—thatindicateswhenpartitionsofastreamhavestoppedchanging.Thesepartitionsareidentiedand“chased”throughadataowviatechniquesfromfunctionaldependencyanalysis,anothersurprisingapplicationofdatabasetheorytodistributedconsistency.TheBLAZESarchitectureisdepictedinFigure1.BLAZEScanbedirectlyappliedtoexistingprogrammingplatformsbasedondistributedstreamordataowprocessing,includingTwitterStorm[21],ApacheS4[23],andSparkStreaming[24].ProgrammersofstreamprocessingenginesinteractwithBLAZESina“greybox”manner:theyprovidesimplesemanticannotationstotheblack-boxcomponentsintheirdataows,andBLAZESperformstheanalysisofalldataowpathsthroughtheprogram.BLAZEScanalsotakeadvantageofthericheranalyzabilityofdeclarativelanguageslikeBloom.Bloomprogrammersarefreedfromtheneedtosupplyannotations,sinceBloom'slanguagesemanticsallowBLAZEStoinfercomponentpropertiesautomatically.Wemakethefollowingcontributionsinthispaper:ConsistencyAnomaliesandProperties.Wepresentaspectrumofconsistencyanomaliesthatariseindistributeddataows.Weidentifykeypropertiesofbothstreamsandcomponentsthataffectconsistency.CompositionofProperties.Weshowhowtoanalyzethecompositionofconsistencypropertiesincomplexprogramsviaaterm-rewritingtechniqueoverdataowpaths,whichtranslateslocalcomponentpropertiesintoend-to-endstreamproperties.CustomCoordinationCode.Wedistinguishtwoalterna-tivecoordinationstrategies,orderingandsealing,andshowhowwecanautomaticallygenerateapplication-awarecoordinationcodethatusesthecheapersealingtechniqueinmanycases.WeconcludebyevaluatingtheperformancebenetsofferedbyusingBLAZESasanalternativetogeneric,order-basedcoordinationmechanismsavailableinbothStormandBloom.B.RunningExamplesWeconsidertworunningexamples:astreaminganalyticqueryimplementedusingtheStormstreamprocessingsystemandadistributedad-trackingnetworkimplementedusingtheBloomdistributedprogramminglanguage.StreaminganalyticswithStorm:Figure2showsthearchi-tectureofaStormtopologythatcomputesacontinuouswordcountovertheTwitterstream.Each“tweet”isassociatedwithanumberedbatch(theunitofreplay)andissenttoexactlyoneSplittercomponent—whichdividestweetsintotheirconstituentwords—viarandompartitioning.ThewordsarehashpartitionedtotheCountcomponent,whichtalliesthenumberofoccurrencesofeachwordinthecurrentbatch.Whenabatchends,theCommitcomponentrecordsthebatchnumberandfrequencyforeachwordinabackingstore.Stormensuresfault-toleranceviareplay:ifcomponentinstancesfailortimeout,streamsourcesredelivertheirinputs. Fig.1:TheBLAZESframework.Inthe“greybox”system,program-merssupplyacongurationlerepresentinganannotateddataow.Inthe“whitebox”system,thisleisautomaticallygeneratedviastaticanalysis. Fig.2:PhysicalarchitectureofaStormwordcounttopology.Itisuptotheprogrammertoensurethataccuratecountsarecommittedtothestoredespitetheseat-least-oncedeliverysemantics.OneapproachistomaketheStormtopologytransactional—i.e.,onethatprocessestuplesinatomicbatches,ensuringthatcertaincomponents(calledcommitters)emitthebatchesinatotalorder.Byrecordingthelastsuccessfullyprocessedbatchidentier,aprogrammermayensureat-most-onceprocessinginthefaceofpossiblereplaybyincurringtheextraoverheadofsynchronizingtheprocessingofbatches.Notethatbatchesareindependentinthewordcountingapplication;becausethestreamingquerygroupsoutputsbybatchid,thereisnoneedtoorderbatcheswithrespecttoeachother.BLAZEScanaidatopologydesignerinavoidingunnecessarilyconservativeorderingconstraints,which(aswewillseeinSectionVIII)resultsinuptoa3improvementinthroughputinourexperiments.Ad-trackingwithBloom:Figure3depictsanad-trackingnet-work,inwhichacollectionofadserversdeliveradvertisementstousers(notshown)andsendclicklogs(edgeslabeled“c”)toasetofreportingserverreplicas.Reportingserverscomputeacontinuousquery;analystsmakerequests(“q”)forsubsetsofthequeryanswer(e.g.,byvisitinga“dashboard”)andreceiveresultsviathestreamlabeled“r”.Toimproveresponsetimesforcommonqueries,acachingtierisinterposedbetweenanalystsandreportingservers.Ananalystposesarequestaboutaparticularadtoacacheserver.Ifthecachecontainsan Severity Label Conuent Stateless 1 CR X X 2 CW X 3 ORgate X 4 OWgate Fig.7:TheC.O.W.R.componentannotations.AcomponentpathiseitherConuentorOrder-sensitive,andeitherchangescomponentstate(aWritepath)ordoesnot(aRead-onlypath).Componentpathswithhigherseverityannotationscanproducemorestreamanomalies.indicatingtheterminationofacampaign,thenonmonotonicqueryCAMPAIGNcanproducedeterministicoutputs.IV.ANNOTATEDDATAFLOWGRAPHSSofar,wehavefocusedontheconsistencyanomaliesthatcanaffectindividual“blackbox”components.Inthissection,weextendourdiscussionbypresentingagreyboxmodelinwhichprogrammersprovidesimpleannotationsaboutthesemanticpropertiesofcomponents.InSectionV,weshowhowBLAZEScanusetheseannotationstoautomaticallyderivetheconsistencypropertiesofentiredataowgraphs.A.AnnotationsandLabelsInthissection,wedescribealanguageofannotationsandlabelsthatenrichesthe“blackbox”model(SectionII)withadditionalsemanticinformation.Programmerssupplyannotationsaboutpathsthroughcomponentsandaboutinputstreams;usingthisinformation,BLAZESderiveslabelsforeachcomponent'soutputstreams.1)ComponentAnnotations:BLAZESprovidesasmall,intuitivesetofannotationsthatcapturecomponentpropertiesrelevanttostreamconsistency.Areviewoftheimplementationoranalysisofacomponent'sinput/outputbehaviorshouldbesufcienttochooseanappropriateannotation.Figure7liststhecomponentannotationssupportedbyBLAZES.Eachannotationappliestoapathfromaninputinterfacetoanoutputinterface;ifacomponenthasmultipleinputoroutputinterfaces,eachpathcanhaveadifferentannotation.TheCRannotationindicatesthatapaththroughacompo-nentisconuentandstateless;thatis,itproducesdeterministicoutputregardlessofitsinputorder,anditsinputsdonotmodifythecomponent'sstate.CWdenotesapaththatisconuentandstateful.TheannotationsORgateandOWgatedenotenon-conuentpathsthatarestatelessorstateful,respectively.Thegatesubscriptisasetofattributenamesthatindicatesthepartitionsoftheinputstreamsoverwhichthenon-conuentcomponentoperates.ThisannotationallowsBLAZEStodeterminewhetheraninputstreamcontainingend-of-partitionpunctuationscanproducedeterministicexecutionswithoutusingglobalcoordi-nation.Supplyinggateisoptional;iftheprogrammerdoesnotknowthepartitionsoverwhichthecomponentpathoperates,theannotationsORandOWindicatethateachrecordbelongstoadifferentpartition.ConsiderareportingservercomponentimplementingthequeryWINDOW.Whenitreceivesarequestreferencingaparticularadvertisementandwindow,itreturnsaresponseiftheadvertisementhasfewerthan1000clickswithinthatwindow.AnappropriatelabelforthepathfromrequestinputstooutputsasORid;window—astatelessorder-sensitivepathoperatingoverpartitionswithcompositekeyid,window.Requestsdonotaffecttheinternalstateofthecomponent,buttheydoreturnpotentiallynondeterministicresultsthatdependontheoutcomesofracesbetweenqueriesandclickrecords.Notehoweverthatifweweretodelaytheresultsofqueriesuntilwewerecertainthattherewouldbenonewrecordsforaparticularadvertisementoraparticularwindow,2theoutputwouldbedeterministic.HenceWINDOWiscompatiblewithclickstreamspartitioned(andemittingappropriatepunctuations)onidorwindow.2)StreamAnnotations:Programmerscanalsosupplyop-tionalannotationstodescribethesemanticsofstreams.TheSealkeyannotationmeansthatthestreamispunctuatedonthesubsetkeyofthestream'sattributes—thatis,thestreamcontainspunctuationsonkey,andthereisatleastonepunctuationcorrespondingtoeverystreamrecord.Forexample,astreamrepresentingmessagesbetweenaclientandservermighthavethelabelSealsession,toindicatethatclientswillsendmessagesindicatingthatsessionsarecomplete.Toensureprogress,theremustbeapunctuationforeverysessionidentierthatappearsinanymessage.ProgrammerscanusetheRepannotationtoindicatethatastreamisreplicated.Areplicatedstreamconnectsaproducercomponentinstance(orinstances)tomorethanoneconsumercomponentinstance,andproducesthesamecontentsforallstreaminstances(unlike,forexample,apartitionedstream).TheRepannotationcarriessemanticinformationbothaboutexpectedexecutiontopologyandprogrammerintent,whichBLAZESusestodeterminewhennondeterministicstreamcontentscanleadtoreplicadivergence.RepisanoptionalBooleanagthatmaybecombinedwithotherannotationsandlabels.V.COORDINATIONANALYSISANDSYNTHESISBLAZESusescomponentandstreamannotationstodeter-mineifagivendataowisguaranteedtoproducedeterministicoutcomes;ifitcannotmakethisguarantee,itaugmentstheprogramwithcoordinationcode.Inthissection,wedescribetheprogramanalysisandsynthesisprocess.A.AnalysisToderivelabelsfortheoutputstreamsinadataowgraph,BLAZESstartsbyenumeratingallpathsbetweenpairsofsourcesandsinks.Toruleoutinnitepaths,itreduceseachcycleinthegraphtoasinglenodewithacollapsedlabelbyselectingthelabelofhighestseverityamongthecycle2Thisrulesoutracesbyensuring(withoutenforcinganorderingonmessagedelivery)thatthequerycomesafterallrelevantclickrecords. unnecessarytoensuredeterministicreplay,andhenceconsistentoutcomes.1)Componentannotations:Toannotatethethreecompo-nentsoftheStormwordcountquery,weprovidethefollowingletoBLAZES:Splitter:annotation:-ffrom:tweets,to:words,label:CRgCount:annotation:-ffrom:words,to:counts,label:OW,subscript:[word,batch]gCommit:annotation:ffrom:counts,to:db,label:CWgSplitterisastateless,conuentcomponent:wegiveittheannotationCR.WeannotateCountasOWword;batch—itisstateful(accumulatingcountsovertime)andorder-sensitive,butpotentiallysealableonwordorbatch(orboth).Lastly,Commitisalsostateful(thebackingstoretowhichitstoresthenalcountsispersistent),butsinceitisappend-onlyanddoesnotrecordtheorderofappends,weannotateitCW.2)Analysis:Intheabsenceofanysealannotations,BLAZESderivesanoutputlabelofRunforthewordcountdataow.Withoutcoordination,nondeterministicinputordersmaypro-ducenondeterministicoutputcontentsduetotheorder-sensitivenatureoftheCountcomponent.Toensurethatreplay(Storm'sinternalfault-tolerancestrategy)isdeterministic,BLAZESwillrecommendthatthetopologybecoordinated—theprogrammercanachievethisbymakingthetopology“transactional”(inStormterminology),totallyorderingthebatchcommits.If,ontheotherhand,theinputstreamissealedonbatch,BLAZESrecognizesthecompatibilitybetweenthestreampunctuationsandtheCountcomponent,whichoperatesovergroupingsetsofword,batch.Becauseabatchisatomic(itscontentsmaybecompletelydeterminedonceasealrecordarrives)andindependent(emittingaprocessedbatchneveraffectsanyotherbatches),thetopologywillproducedeterministicoutputsunderallinterleavings.B.Ad-reportingsystemNextwedescribehowwemightannotatethevariouscompo-nentsofthead-reportingsystem.AswediscussinSectionVII,theseannotationscanbeautomaticallyextractedfromtheBloomsyntax;forexposition,inthissectionwediscusshowaprogrammermightmanuallyannotateananalogousdataowwritteninalanguagewithoutBloom'sstatic-analysiscapabilities.Aswewillsee,ensuringdeterministicoutputswillrequiredifferentmechanismsforthedifferentquerieslistedinFigure6.1)Componentannotations:BelowistheBLAZESannotationlefortheadservingnetwork:Cache:annotation:-ffrom:request,to:response,label:CRg-ffrom:response,to:response,label:CWg-ffrom:request,to:request,label:CRgReport:Rep:trueannotation:-ffrom:click,to:response,label:CWgPOOR:ffrom:request,to:response,label:OR,subscript:[id]gTHRESH:ffrom:request,to:response,label:CRgWINDOW:ffrom:request,to:response,label:OR,subscript:[id,window]gCAMPAIGN:ffrom:request,to:response,label:OR,subscript:[id,campaign]gThecacheisclearlyastatefulcomponent,butsinceitsstateisappend-onlyandorder-independentwemayannotateitCW.Becausethedata-collectionpaththroughthereportingserversimplyappendsclicksandimpressionstoalog,weannotatethispathCWalso.Allthatremainsistoannotatetheread-onlypaththroughthereportingcomponentcorrespondingtothevariouscontinuousquerieslistedinFigure6.Reportisareplicatedcomponent,sowesupplytheRepannotationforallinstances.WeannotatethequerypathcorrespondingtoTHRESH—whichisconuentbecauseitneveremitsarecorduntiltheadimpressionsreachthegiventhreshold—CR.WeannotatequeriesPOORandCAMPAIGNORidandORid;campaign,respectively.Thesequeriescanreturndifferentcontentsindifferentexecutions,recordingtheeffectofmessageracesbetweenclickandrequestmessages.WegivequeryWINDOWtheannotationORid;window.UnlikePOORandCAMPAIGN,WINDOWincludestheinputstreamattributewindowinitsgroupingclause.Itsoutputsarethereforepartitionedbyvaluesofwindow,makingitcompatiblewithaninputstreamsealedonwindow.2)Analysis:Havingannotatedalloftheinstancesofthereportingservercomponentfordifferentqueries,wemaynowconsiderhowBLAZESautomaticallyderivesoutputstreamlabelsfortheglobaldataow.IfwesupplyTHRESH,BLAZESderivesanallabelofAsyncfortheoutputpathfromcachetosink.Allcomponentsareconuent,sothecompletedataowproducesdeterministicoutputswithoutcoordination.Ifwechose,wecouldencapsulatetheserviceasasinglecomponentwithannotationCW.GivenqueryPOORwithnoinputstreamannotations,BLAZESderivesalabelofDiverge.Thepoorperformersqueryisnotconuent:itproducesnondeterministicoutputcontents.Becausetheseoutputsmutateastateful,replicatedcomponent(i.e.,thecache)thataffectssystemoutputs,theoutputstreamistaintedbydivergentreplicastate.Preventingreplicadivergencewillrequireacoordinationstrategythatcontrolsmessagedeliveryordertothereportingserver.If,however,theinputstreamissealedoncampaign,BLAZESrecognizesthecompatibilitybetweenthestreampartitioningandthecomponentpathannotationORid;campaign,synthesizesaprotocolthatallowsthepartitiontobeprocessedwhenithasstoppedchanging,andgivesthedataowthelabelAsync.Implementingthissealingstrategydoesnotrequireglobalcoordination,butmerelysomesynchronizationbetweenstreamproducersandconsumers.Similarly,WINDOW(givenaninputstreamsealedon Fig.8:TheeffectofcoordinationonthroughputforaStormtopologycomputingastreamingwordcount.Weusedasinglededicatednode(asthedocumentationrecommends)fortheStormmasterandthreeZookeeperservers.Ineachexperiment,weallowedthetopologyto“warmup”andreachsteadystatebyrunningitfor10minutes.Figure8plotsthethroughputofthecoordinatedanduncoordinatedimplementationsofthewordcountdataowasafunctionoftheclustersize.Theoverheadofconservativelydeployingatransactionaltopologyisconsiderable.Theuncoor-dinateddataowhasapeakthroughputroughly1.8timesthatofitscoordinatedcounterpartina5-nodedeployment.Aswescaleuptheclusterto20nodes,thedifferenceinthroughputgrowsto3.B.AdreportingTocomparetheperformanceofthesealingandorderingcoordinationstrategies,weconductedaseriesofexperimentsusingaBloomimplementationoftheadtrackingnetworkintroducedinSectionI-B.Foradservers,whichsimplygenerateclicklogsandforwardthemtoreportingservers,weused10microinstances.Wecreated3reportingserversusingmediuminstances.OurZookeeperclusterconsistedof3smallinstances.Adserversgenerateaworkloadof1000logentriesperserver,dispatching50clicklogmessagesinbatchandsleepingperiodically.Duringtheworkload,weposeanumberofrequeststothereportingservers,allofwhichimplementthecontinuousqueryCAMPAIGN.Althoughthissystem—implementedintheBloomlanguageprototype—doesnotillustratethevolumewewouldexpectinahigh-performanceimplementation,wewillseethatithighlightssomeimportantrelativepatternsacrossdifferentcoordinationstrategies.1)Baseline:NoCoordination:Fortherstrun,wedonotenabletheBLAZESpreprocessor.Thusclicklogsandrequestsowinanuncoordinatedfashiontothereportingservers.Theuncoordinatedrunprovidesalowerboundforperformanceofappropriatelycoordinatedimplementations.However,itdoesnothavethesamesemantics.Weconrmedbyobservationthatcertainqueriesposedtomultiplereportingserverreplicasreturnedinconsistentresults.Thelinelabeled“Uncoordinated”inFigures9and10showsthelogrecordsprocessedovertimefortheuncoordinatedrun,forsystemswith5and10adservers,respectively.2)OrderingStrategy:InthenextrunweenabledtheBLAZESpreprocessorbutdidnotsupplyanyinputstreamannotations.BLAZESrecognizedthepotentialforinconsistentanswersacrossreplicasandsynthesizedacoordinationstrategybasedonordering.ByinsertingcallstoZookeeper,allclicklogentriesandrequestsweredeliveredinthesameordertoallreplicas.Thelinelabeled“Ordered”inFigures9and10plotstherecordsprocessedovertimeforthisstrategy.Theorderingstrategyruledoutinconsistentanswersfromreplicasbutincurredasignicantperformancepenalty.Scalingupthenumberofadserversbyafactoroftwohadlittleeffectontheperformanceoftheuncoordinatedimplementation,butincreasedtheprocessingtimeinthecoordinatedrunbyafactorofthree.3)SealingStrategies:ForthelastexperimentsweprovidedtheinputannotationSealcampaignandembeddedpunctuationsintheadclickstreamindicatingwhentherewouldbenofurtherlogrecordsforaparticularcampaign.RecognizingthecompatibilitybetweenthesealedstreamandtheaggregatequeryinCAMPAIGN(a“group-by”onid,campaign),BLAZESsynthesizedaseal-basedcoordinationstrategy.Usingtheseal-basedstrategy,reportingserversdonotneedtowaituntileventsaregloballyordered;instead,theyareprocessedassoonasareportingservercandeterminethattheybelongtoasealedpartition.ThereportingserversuseZookeeperonlytodeterminethesetofadserversresponsibleforeachcampaign—thatis,onecalltoZookeeperpercampaign.Whenareportingserverhasreceivedsealmessagesfromallproducersforagivencampaign,itemitsthepartitionforprocessing.InFigures9and10weevaluatethesealingstrategyfortwoalternativepartitioningsofclickrecords:in“Independentseal”eachcampaignismasteredatexactlyoneadserver,whilein“Seal,”alladserversproduceclickrecordsforallcampaigns.Notethatbothseal-basedrunscloselytracktheperformanceoftheuncoordinatedrun;doublingthenumberofadserverseffectivelydoublessystemthroughput.Tohighlightthedifferencesbetweenthetwoseal-basedruns,Figure11plotsthe10-serverrunbutomitstheorderingstrategy.Aswewouldexpect,“independentseals”resultinlowerlatenciesbecausereportingserversmayprocesspartitionsassoonasasinglesealmessageappears(sinceeachpartitionhasasingleproducer).Bycontrast,thestep-likeshapeofthenon-independentsealstrategyreectsthefactthatreportingserversdelayprocessinginputpartitionsuntiltheyhavereceivedasealrecordfromeveryproducer.Partitioningthedataacrossadserverssoastoplaceadvertisementcontentclosetoconsumers(i.e.,partitioningbyadid)causedcampaignstobespreadacrossadservers,conictingwiththecoordinationstrategy.Werevisitthenotionof“coordinationlocality”inSectionX.IX.RELATEDWORKOurapproachtoautomaticallycoordinatingdistributedservicesdrawsinspirationfromtheliteratureonbothdistributed