114K - views

Eventual Consistency How soon is eventual An Evaluation of Amazon Ss Consistency Behavior David Bermbach and Stefan Tai Karlsruhe Institute of Technology Karlsruhe Germany rstname

lastnamekitedu ABSTRACT Over the last few years Cloud storage systems and socalled NoSQL datastores have found widespread adoption In con trast to traditional databases these storage systems typi cally sacri64257ce consistency in favor of latency and

Embed :
Pdf Download Link

Download Pdf - The PPT/PDF document "Eventual Consistency How soon is eventua..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Eventual Consistency How soon is eventual An Evaluation of Amazon Ss Consistency Behavior David Bermbach and Stefan Tai Karlsruhe Institute of Technology Karlsruhe Germany rstname






Presentation on theme: "Eventual Consistency How soon is eventual An Evaluation of Amazon Ss Consistency Behavior David Bermbach and Stefan Tai Karlsruhe Institute of Technology Karlsruhe Germany rstname"— Presentation transcript:

modelscomeinslightlydi erent avors,rangingfromtradi-tionalstrictconsistency,whichrequiresallreplicaofalldataitemstobeidenticalaswellasallsemanticalrelationshipsbetweendataitemstobeobserved,toconsistencyguaran-teeswhichcanbefoundinsystemsliketheGoogleFileSys-tem[8],wherereplicaaretreatedasconsistentonceeverycopyincludeseverysingleupdateatleastonce.Ontheotherhand,thereareclient-centricconsistencymodelsthatdonotcareabouttheinternalstateofastoragesystem.Insteadtheyfocusontheconsistencyguaranteeswhichcanactuallybeobservedbyoneormoreclients,e.g.,whetherstaledataisreturnedornot.Inconsequence,whenmeasuringhowsooneventualcon-sistencyis(thatis,measuringthelengthoftheinconsis-tencywindow),thereareagaintwodi erentperspectivesonthis.ACloudstorageprovideroranyonewithaccesstothesourcecodeofastoragesystem(inthefollowingjustprovider)wouldratherfocusonadata-centricperspective.Toaconsumer,incontrast,itreallydoesnotmatterwhetherinternallyaCloudstoragesystemcontainsahugenumberofstalereplicaaslongastheproviderhasimplementedmech-anismstodealwiththose.Aslongasnostaledataisob-served,thecustomerissatis ed.3.APPROACHANDIMPLEMENTATIONMeasuringthelengthoftheinconsistencywindowistriv-ialfromaproviderperspective:Byaddingdetailedloggingornoti cationfunctionalitytothestoragesystemitiseasilypossibletohavetheactualtimestampsofeachreplicaup-datereadilyavailable.Bycalculatingthedi erencebetweenthelatestandthe rsttimestampitis,hence,possibletogetthedesiredresult.Fromacustomerperspective,incontrast,whomightonlyhaveblack-boxaccesstothestoragesystem(e.g.,incaseofCloudstoragesystems)ormightnothavethemeansorknowledgetochangethesourcecodeofastoragesystem,itismoreimportanttoknowhowlongittakesfromissu-inganupdatetobeingabletostillreadtheoldversion.Foreventuallyconsistentstoragesystems,thisistypicallyavalueexpectedtobegreaterthanzero.Thisvaluecanbeexperimentallydeterminedbythefollowingsteps:1.Createatimestamp.2.Writeaversionnumbertothestoragesystem.3.Continuouslyreaduntiltheoldversionnumberisnolongerreturned,thencreateanewtimestamp.4.Calculatethedi erencebetweenthewritetimestampandthesecondtimestamp(timeofthelastreadofthepreviousversion).5.Repeatthesestepstoachievestatisticalsigni cance.Dependingonthelatencyofstep2,analternativeap-proachmightcreateanothertimestampbetweensteps2and3andusethemeanofthoseforthecalculationinstep5(in-steadofthetimestampfromstep1).Please,notethatitisnecessarytousethelastreadoftheoldversionandnotthe rstreadofthenewversionas{forsystemswheremonotonicreadconsistency5isviolated 5Monotonicreadconsistencyisde nedasfollows:Afterhav-ingreturnedversionntoaspeci cclientthesystemguar-anteestoreturnonlyversionsn[20]{thetimestampofthelastreadoftheoldversionmaybelongafterthetimestampofthe rstreadofthenewver-sion.Oursystemidenti esthelastreadofaparticularver-sionusinganinternalbu er:foreveryversioneachreaderremembersthelasttimeitcouldreadthatversion.Oncethebu erisfull,theinconsistencywindowiscalculatedfortheoldestversiononly,beforeitisremovedfromthebu er.Forourexperimentswehavechosencombinationsofbu ersizeandwriteintervalwhichguaranteethatthehighestob-servedinconsistencywindoweasily tsintothebu er,i.e.,bufferSizewriteIntervalmaxInconsistencyWindow.E.g.apeakinconsistencywindowofabout33scombinedwithacon gurationwhichenablesustocapturevaluesaslargeas100s.Independentfromourwork,Wadaetal.[21]proposeaverysimilarapproach,alreadywithinterestingresults.Inouropinion,theirapproachhasafundamental aw,though:onlyonereaderisusedintheirexperimentalsetup.Byusingonlyonereader,especiallywhenrunninginthesamedatacenterasthewriterorevenworserunningonthesamemachine,itisimprobabletoactuallydiscoverstaleness.Thisisduetotwofacts:1.Adistributedstoragesystemusuallyusessomekindofloadbalancer.DependingontheintelligenceoftheloadbalanceritisnotunlikelythatallrequestsfromthesameIPrangeareforwardedtothesamereplicaorthatthereisevenacachinglayerinbetween.Ac-curacycanbegreatlyincreasedbyrunningadditionalgeographicallydistributedreaders.2.Onereadercan,dependingonthelatencyLofthestoragesystem,onlyachievearesolutionof1=L,i.e.,sendonly1=Lrequestsperunitoftime.Anythingthathappensinbetweenisunknown.Thisresolutionoftheresultscanbealmostlinearlyimprovedbyaddingmorereaderinstances.Forthesereasons,wehaveimplementedasystemwhereonewriterperiodicallywritesalocaltimestampplusaver-sionnumbertothestoragesystem.Next,thereisanumberofreaders(theactualnumberdependsonthestoragesys-tem)whicharegeographicallydistributed.Thesereaderin-stancescontinuouslypollthestoragesystemandrememberforeachversionthelatestpointintimewheretheycouldstillreadthatspeci cversion.Aftercollectingthisdatafromallreaders,wethenconsiderthedi erencebetweenthelatestreadtimestampofversionnandthewritetimestampofversionn+1.Thisis,becausetheclient-observableincon-sistencywindowistheperiodoftimeaftersubmittinganupdatewhereitisstillpossibletoreadthepreviousversion.Figure1showsanexamplewhichshallservetobetterexplainhowwederiveourresults.Thedatausedisnotrealdataasweusuallyhaveabout1,000readsinbetweentwowrites.Wehaveobservedsimilarlogsinrealmonitoringdata,though.Inthisexample,thestoragesystemviolatesmonotonicreadconsistency.The gureshowsatimelineintheleftcolumn,thedatathewriterwroteinthesecondcolumn,andwhatthetworeadersreadatdi erentpointsintimeintheothertwocolumns.Basedonthehighlightedlastreadsforagivenversionitisthenpossibletocalculatethetableintherightpartofthe gure.Forexample,after5unitsoftime(TU)thewriterwritesversionBtothestoragesystem.Reader1readstheold Figure3:LengthofLOWandSAWPeriodsoverTimeonS3counterparts.Forourpurposesweplacedabucketintheregioneu-west(Ireland)sincewehad,duringourMiniStoragetests,observedthatwecouldnotstartEC2instancesinus-east1awhereaswecouldstartinstancesinallavailabilityzonesofeu-west.WhenwerepeatedourMiniStoragetestforS3startingadditionalreadersincertainintervals,weobservedthatourresultswerefairlyconstantbeyond8readers.Toneverthelessplayitsafe,wedeployed12readers{4peravailabilityzone.Ourwriteraswellasthecollectorweredeployedinzonea,allinstancesagainweresmallinstances.Wechoseanupdateintervalof10stogiveeachupdateenoughtime(inourmind)topropagatewithoutinterfer-ingwitholderupdates.Thepollintervalperreaderwassetto10ms.WestartedthetestonAugust29,20118.30hAM(UTC)andkeptitrunningforaweek.Incontrasttothe ndingsofWadaetal.[21]whocouldnotobserveanyinconsistenciesatall,andincontrasttoourexpectationsofseeinganormaldistributionofinconsistencywindowlengths,ourresultsshowsomestrangeperiodicities.First,thereisalong-termperiodicity:Roughlyevery12hoursthebehaviorofS3abruptlychangesbetweenwhatwewillcallaLOWphaseandaSAWphase.Figure3showsthelengthofthoseperiodsincomparison.DuringtheLOWphaseweactually ndarandom8dis-tributionwithameanvalueof28msandamedianof15ms.Please,notethatthesevaluesmaybeexactbutcouldbeo byatleastafactor2duetotheaccuracylimitationsofNTP[15]whichweuseforclocksynchronization.Webe-lieve,though,thatmedianandmeanvaluesbetween0and100msarerealistic.DuringourSAWperiodswecanobserveacurvewhichresemblesasawtoothwave{hence,thename.ItreallydoesnotmatterwhichSAWphaseweselectanexcerptfrom,theperiodicityfollowsalwaysthesamepattern:First,theinconsistencywindow'slengthisclosetozero.Then,itin-creasesbyaboutoneortwosecondswitheverytestuntilitpeaksataboutelevensecondsbeforedroppingstraightdowntothenextminimum.Theonlydi erencethatcanbefoundisthattheminimumcanbefoundintheintervalbe-tweenzeroand vesecondsandthemaximumcanbefound 8Thedistributionhasthreelocalmaxima:theabsolutemax-imumat7ms,nextsmallerlocalmaximumat26msandan-othersmalllocalmaximumat90ms. Figure4:ObservedInconsistencyWindowLengthduringSAWPeriodsOverTimeonS3(Excerpt)betweentenandtwelveseconds.Thewavelengthofthispattern uctuatesbetweeneightandtwelvetests,i.e.,forourtestsetupthepatternrestartsevery80to120s.Figure4showsanexcerptfromoneoftheSAWphases.Wehavebeenresearchingthequestionofconsistencymon-itoringforquiteawhilenow.RepeatedtestsonS3showedtheexactsameresults.AlreadyinJulyandAugust2010,weexperimentallyanalyzedconsistencyguaranteesofS3viaanindependentimplementationwhichalsousedaslightlydi erentalgorithm.Evenbackthen(whereitwasonlyaby-productofourevaluationof[3])weobservedremarkablysimilarbehavior.Figure5showsthefullresultsofouroneweekevaluationofS3.Duetothesheernumberoftestrunsand,hence,thedensityofthecurve,itisnotpossibletoseethesawtoothpatternduringtheSAWphasesbutitisstilleasilypossibletodistinguishSAWandLOWphases.Another ndingwasthattheavailabilityzonesseemdi er-entintermsofaccessingthelatestversion.Whileourwriterwasinzonea,thelongestinconsistencywindowlengthwasobservedin28%ofalltestsinzonea.Thesameistrueforzonecwhilezonebhadthemaximumin49%ofalltests9.Thisindicatesthatzonebseemstohaveaslightlypoorerconnectiontotheothertwozones,e.g.,bybeinglocatedinadi erentbuilding.Furthermore,regardinglocationswecouldnotseedi er-encesbetweenthezones:Theyalldidthesamesawtoothwaveandhadtheirmaximaandminimaattheexactsametimeonlytheamplitudeswereslightlydi erentwhichcre-atestheresultsfromthelastparagraph.Wealsotestedourresultsforviolationsofmonotonicreadconsistency.Fromatotalof353,357,884reads42,565,840orabout12%ofallrequestsviolatedmonotonicreadconsis-tency[20].Inexchange,weobservedanavailabilityofmorethaneightnines(99.9999997%{onlyonerequestreturnedanerror).6.DISCUSSIONInsummary,weobservedanunexpected,veryinterest-ingconsistencybehaviorofAmazonS3,buthavesofarnotbeenabletocomeupwithasatisfyingexplanationofourexperimental ndings.Possibleexplanationscouldbecachinge ectsormeasurementstocounterDDoSattacts 9Thetotalisnot100%asforabout5%ofallteststwozonesobservedthesameinconsistencywindow. counteredstrangeperiodicities,namelyourso-calledSAWandLOWphaseswhichalternateapproximatelytwiceaday.Furthermore,wedescribedthesawtoothwave-likebehaviorofS3duringSAWphasesbeforediscussingpotentialexpla-nations.Ourapproachofgeographicallydistributedreaderscom-binedwithawriter tsintocurrentresearchregardingbench-markingofdistributeddatastoresaswellassystemsbuildingontopofthat.Ourresultsprovideconcretedatathatservesascriteriaforanapplicationdevelopertodeterminewhetheraneventualconsistencydatastoreprovidesacceptablecon-sistencyguarantees.Infutureendeavors,wewilltrytodeterminedependen-ciesbetween lesonS3,e.g.,howperiodicitiesof leswithinthesamebucketoracrossmultiplebucketscorrelate.Fur-thermore,wearecurrentlybenchmarkingApacheCassandraandtheGoogleAppEnginedatastore.Weplantopublishtheseresultsaswellastoextendoure ortstoadditionalstoragesystemsinafollow-uppaper.Finally,YuandVahdat[22]aswellassimilarmodelsknowotherconsistencydimensionsbeyondstaleness,e.g.,ordererror.Weareinvestigatingmeanstoalsomeasurethesedimensions.9.REFERENCES[1]E.Anderson,X.Li,M.Shah,J.Tucek,andJ.Wylie.Whatconsistencydoesyourkey-valuestoreactuallyprovide.InProceedingsoftheSixthWorkshoponHotTopicsinSystemDependability(HotDep),2010.[2]R.Baldoni,A.Corsaro,L.Querzoni,S.Scipioni,andS.Tucci-Piergiovanni.Anadaptivecoupling-basedalgorithmforinternalclocksynchronizationoflargescaledynamicsystems.InProceedingsofthe2007OTMConfederatedinternationalconferenceonOnthemovetomeaningfulinternetsystems-VolumePartI,pages701{716.Springer-Verlag,2007.[3]D.Bermbach,M.Klems,M.Menzel,andS.Tai.Metastorage:Afederatedcloudstoragesystemtomanageconsistency-latencytradeo s.InProceedingsofthe4thInternationalConferenceonCloudComputing(IEEECloud2011).IEEE,2011.[4]B.Cooper,R.Ramakrishnan,U.Srivastava,A.Silberstein,P.Bohannon,H.Jacobsen,N.Puz,D.Weaver,andR.Yerneni.PNUTS:Yahoo!'shosteddataservingplatform.ProceedingsoftheVLDBEndowment,1(2):1277{1288,2008.[5]B.Cooper,A.Silberstein,E.Tam,R.Ramakrishnan,andR.Sears.Benchmarkingcloudservingsystemswithycsb.InProceedingsofthe1stACMsymposiumonCloudcomputing,pages143{154.ACM,2010.[6]G.DeCandia,D.Hastorun,M.Jampani,G.Kakulapati,A.Lakshman,A.Pilchin,S.Sivasubramanian,P.Vosshall,andW.Vogels.Dynamo:amazon'shighlyavailablekey-valuestore.InProc.SOSP,2007.[7]A.FoxandE.Brewer.Harvest,yield,andscalabletolerantsystems.InProceedingsofthe7thWorkshoponHotTopicsinOperatingSystems,1999,pages174{178.IEEE,2002.[8]S.Ghemawat,H.Gobio ,andS.Leung.TheGoogle lesystem.ACMSIGOPSOperatingSystemsReview,37(5):29{43,2003.[9]M.Klems,M.Menzel,andR.Fischer.Consistencybenchmarking:Evaluatingtheconsistencybehaviorofmiddlewareservicesinthecloud.InProceedingsofthe8thInternationalConferenceonServiceOrientedComputing(ICSOC).Springer,Dec.2010.[10]D.Kossmann,T.Kraska,andS.Loesing.Anevaluationofalternativearchitecturesfortransactionprocessinginthecloud.InProceedingsofthe2010internationalconferenceonManagementofdata,pages579{590.ACM,2010.[11]T.Kraska,M.Hentschel,G.Alonso,andD.Kossmann.ConsistencyRationingintheCloud:Payonlywhenitmatters.ProceedingsoftheVLDBEndowment,2(1):253{264,2009.[12]J.Kubiatowicz,D.Bindel,Y.Chen,S.Czerwinski,P.Eaton,D.Geels,R.Gummadi,S.Rhea,H.Weatherspoon,C.Wells,etal.Oceanstore:Anarchitectureforglobal-scalepersistentstorage.ACMSIGARCHComputerArchitectureNews,28(5):190{201,2000.[13]A.LakshmanandP.Malik.Cassandra:adecentralizedstructuredstoragesystem.ACMSIGOPSOperatingSystemsReview,44(2):35{40,2010.[14]M.Menzel,M.Schoenherr,andS.Tai.(mc2)2:criteria,requirementsandasoftwareprototypeforcloudinfrastructuredecisions.Software:PracticeandExperience,2011.[15]ntp.org.NTPAlgorithm.http://www.ntp.org/ntpfaq/NTP-s-algo.htm(accessedonSeptember6,2011).[16]S.Sakr,L.Zhao,H.Wada,andA.Liu.Clouddbautoadmin:Towardsatrulyelasticcloud-baseddatastore.InThe9thIEEEInternationalConferenceonWebServices(ICWS2011),WashingtonDC,USA,July2011.[17]M.Satyanarayanan,J.Kistler,P.Kumar,M.Okasaki,E.Siegel,andD.Steere.Coda:Ahighlyavailable lesystemforadistributedworkstationenvironment.IEEETransactionsoncomputers,pages447{459,1990.[18]A.S.TanenbaumandM.V.Steen.DistributedSystems-PrinciplesandParadigms.PearsonEducation,UpperSaddleRiver,NJ,2ndedition,2007.[19]D.Terry,M.Theimer,K.Petersen,A.Demers,M.Spreitzer,andC.Hauser.Managingupdatecon ictsinBayou,aweaklyconnectedreplicatedstoragesystem.ACMSIGOPSOperatingSystemsReview,29(5):172{182,1995.[20]W.Vogels.Eventuallyconsistent.Queue,6:14{19,October2008.[21]H.Wada,A.Fekete,L.Zhao,K.Lee,andA.Liu.Dataconsistencypropertiesandthetradeo sincommercialcloudstorages:theconsumers'perspective.In5thbiennialConferenceonInnovativeDataSystemsResearch,CIDR,volume11,2011.[22]H.YuandA.Vahdat.Designandevaluationofaconit-basedcontinuousconsistencymodelforreplicatedservices.ACMTransactionsonComputerSystems(TOCS),20(3):239{282,2002.[23]L.Zhao,A.Liu,andJ.Keung.Evaluatingcloudplatformarchitecturewiththecareframework.In2010AsiaPaci cSoftwareEngineeringConference,pages60{69.IEEE,2010.