lastnamekitedu ABSTRACT Over the last few years Cloud storage systems and socalled NoSQL datastores have found widespread adoption In con trast to traditional databases these storage systems typi cally sacri64257ce consistency in favor of latency and ID: 7472
Download Pdf The PPT/PDF document "Eventual Consistency How soon is eventua..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
modelscomeinslightlydierent avors,rangingfromtradi-tionalstrictconsistency,whichrequiresallreplicaofalldataitemstobeidenticalaswellasallsemanticalrelationshipsbetweendataitemstobeobserved,toconsistencyguaran-teeswhichcanbefoundinsystemsliketheGoogleFileSys-tem[8],wherereplicaaretreatedasconsistentonceeverycopyincludeseverysingleupdateatleastonce.Ontheotherhand,thereareclient-centricconsistencymodelsthatdonotcareabouttheinternalstateofastoragesystem.Insteadtheyfocusontheconsistencyguaranteeswhichcanactuallybeobservedbyoneormoreclients,e.g.,whetherstaledataisreturnedornot.Inconsequence,whenmeasuringhowsooneventualcon-sistencyis(thatis,measuringthelengthoftheinconsis-tencywindow),thereareagaintwodierentperspectivesonthis.ACloudstorageprovideroranyonewithaccesstothesourcecodeofastoragesystem(inthefollowingjustprovider)wouldratherfocusonadata-centricperspective.Toaconsumer,incontrast,itreallydoesnotmatterwhetherinternallyaCloudstoragesystemcontainsahugenumberofstalereplicaaslongastheproviderhasimplementedmech-anismstodealwiththose.Aslongasnostaledataisob-served,thecustomerissatised.3.APPROACHANDIMPLEMENTATIONMeasuringthelengthoftheinconsistencywindowistriv-ialfromaproviderperspective:Byaddingdetailedloggingornoticationfunctionalitytothestoragesystemitiseasilypossibletohavetheactualtimestampsofeachreplicaup-datereadilyavailable.Bycalculatingthedierencebetweenthelatestandthersttimestampitis,hence,possibletogetthedesiredresult.Fromacustomerperspective,incontrast,whomightonlyhaveblack-boxaccesstothestoragesystem(e.g.,incaseofCloudstoragesystems)ormightnothavethemeansorknowledgetochangethesourcecodeofastoragesystem,itismoreimportanttoknowhowlongittakesfromissu-inganupdatetobeingabletostillreadtheoldversion.Foreventuallyconsistentstoragesystems,thisistypicallyavalueexpectedtobegreaterthanzero.Thisvaluecanbeexperimentallydeterminedbythefollowingsteps:1.Createatimestamp.2.Writeaversionnumbertothestoragesystem.3.Continuouslyreaduntiltheoldversionnumberisnolongerreturned,thencreateanewtimestamp.4.Calculatethedierencebetweenthewritetimestampandthesecondtimestamp(timeofthelastreadofthepreviousversion).5.Repeatthesestepstoachievestatisticalsignicance.Dependingonthelatencyofstep2,analternativeap-proachmightcreateanothertimestampbetweensteps2and3andusethemeanofthoseforthecalculationinstep5(in-steadofthetimestampfromstep1).Please,notethatitisnecessarytousethelastreadoftheoldversionandnottherstreadofthenewversionas{forsystemswheremonotonicreadconsistency5isviolated 5Monotonicreadconsistencyisdenedasfollows:Afterhav-ingreturnedversionntoaspecicclientthesystemguar-anteestoreturnonlyversionsn[20]{thetimestampofthelastreadoftheoldversionmaybelongafterthetimestampoftherstreadofthenewver-sion.Oursystemidentiesthelastreadofaparticularver-sionusinganinternalbuer:foreveryversioneachreaderremembersthelasttimeitcouldreadthatversion.Oncethebuerisfull,theinconsistencywindowiscalculatedfortheoldestversiononly,beforeitisremovedfromthebuer.Forourexperimentswehavechosencombinationsofbuersizeandwriteintervalwhichguaranteethatthehighestob-servedinconsistencywindoweasilytsintothebuer,i.e.,bufferSizewriteIntervalmaxInconsistencyWindow.E.g.apeakinconsistencywindowofabout33scombinedwithacongurationwhichenablesustocapturevaluesaslargeas100s.Independentfromourwork,Wadaetal.[21]proposeaverysimilarapproach,alreadywithinterestingresults.Inouropinion,theirapproachhasafundamental aw,though:onlyonereaderisusedintheirexperimentalsetup.Byusingonlyonereader,especiallywhenrunninginthesamedatacenterasthewriterorevenworserunningonthesamemachine,itisimprobabletoactuallydiscoverstaleness.Thisisduetotwofacts:1.Adistributedstoragesystemusuallyusessomekindofloadbalancer.DependingontheintelligenceoftheloadbalanceritisnotunlikelythatallrequestsfromthesameIPrangeareforwardedtothesamereplicaorthatthereisevenacachinglayerinbetween.Ac-curacycanbegreatlyincreasedbyrunningadditionalgeographicallydistributedreaders.2.Onereadercan,dependingonthelatencyLofthestoragesystem,onlyachievearesolutionof1=L,i.e.,sendonly1=Lrequestsperunitoftime.Anythingthathappensinbetweenisunknown.Thisresolutionoftheresultscanbealmostlinearlyimprovedbyaddingmorereaderinstances.Forthesereasons,wehaveimplementedasystemwhereonewriterperiodicallywritesalocaltimestampplusaver-sionnumbertothestoragesystem.Next,thereisanumberofreaders(theactualnumberdependsonthestoragesys-tem)whicharegeographicallydistributed.Thesereaderin-stancescontinuouslypollthestoragesystemandrememberforeachversionthelatestpointintimewheretheycouldstillreadthatspecicversion.Aftercollectingthisdatafromallreaders,wethenconsiderthedierencebetweenthelatestreadtimestampofversionnandthewritetimestampofversionn+1.Thisis,becausetheclient-observableincon-sistencywindowistheperiodoftimeaftersubmittinganupdatewhereitisstillpossibletoreadthepreviousversion.Figure1showsanexamplewhichshallservetobetterexplainhowwederiveourresults.Thedatausedisnotrealdataasweusuallyhaveabout1,000readsinbetweentwowrites.Wehaveobservedsimilarlogsinrealmonitoringdata,though.Inthisexample,thestoragesystemviolatesmonotonicreadconsistency.Thegureshowsatimelineintheleftcolumn,thedatathewriterwroteinthesecondcolumn,andwhatthetworeadersreadatdierentpointsintimeintheothertwocolumns.Basedonthehighlightedlastreadsforagivenversionitisthenpossibletocalculatethetableintherightpartofthegure.Forexample,after5unitsoftime(TU)thewriterwritesversionBtothestoragesystem.Reader1readstheold Figure3:LengthofLOWandSAWPeriodsoverTimeonS3counterparts.Forourpurposesweplacedabucketintheregioneu-west(Ireland)sincewehad,duringourMiniStoragetests,observedthatwecouldnotstartEC2instancesinus-east1awhereaswecouldstartinstancesinallavailabilityzonesofeu-west.WhenwerepeatedourMiniStoragetestforS3startingadditionalreadersincertainintervals,weobservedthatourresultswerefairlyconstantbeyond8readers.Toneverthelessplayitsafe,wedeployed12readers{4peravailabilityzone.Ourwriteraswellasthecollectorweredeployedinzonea,allinstancesagainweresmallinstances.Wechoseanupdateintervalof10stogiveeachupdateenoughtime(inourmind)topropagatewithoutinterfer-ingwitholderupdates.Thepollintervalperreaderwassetto10ms.WestartedthetestonAugust29,20118.30hAM(UTC)andkeptitrunningforaweek.IncontrasttothendingsofWadaetal.[21]whocouldnotobserveanyinconsistenciesatall,andincontrasttoourexpectationsofseeinganormaldistributionofinconsistencywindowlengths,ourresultsshowsomestrangeperiodicities.First,thereisalong-termperiodicity:Roughlyevery12hoursthebehaviorofS3abruptlychangesbetweenwhatwewillcallaLOWphaseandaSAWphase.Figure3showsthelengthofthoseperiodsincomparison.DuringtheLOWphaseweactuallyndarandom8dis-tributionwithameanvalueof28msandamedianof15ms.Please,notethatthesevaluesmaybeexactbutcouldbeobyatleastafactor2duetotheaccuracylimitationsofNTP[15]whichweuseforclocksynchronization.Webe-lieve,though,thatmedianandmeanvaluesbetween0and100msarerealistic.DuringourSAWperiodswecanobserveacurvewhichresemblesasawtoothwave{hence,thename.ItreallydoesnotmatterwhichSAWphaseweselectanexcerptfrom,theperiodicityfollowsalwaysthesamepattern:First,theinconsistencywindow'slengthisclosetozero.Then,itin-creasesbyaboutoneortwosecondswitheverytestuntilitpeaksataboutelevensecondsbeforedroppingstraightdowntothenextminimum.Theonlydierencethatcanbefoundisthattheminimumcanbefoundintheintervalbe-tweenzeroandvesecondsandthemaximumcanbefound 8Thedistributionhasthreelocalmaxima:theabsolutemax-imumat7ms,nextsmallerlocalmaximumat26msandan-othersmalllocalmaximumat90ms. Figure4:ObservedInconsistencyWindowLengthduringSAWPeriodsOverTimeonS3(Excerpt)betweentenandtwelveseconds.Thewavelengthofthispattern uctuatesbetweeneightandtwelvetests,i.e.,forourtestsetupthepatternrestartsevery80to120s.Figure4showsanexcerptfromoneoftheSAWphases.Wehavebeenresearchingthequestionofconsistencymon-itoringforquiteawhilenow.RepeatedtestsonS3showedtheexactsameresults.AlreadyinJulyandAugust2010,weexperimentallyanalyzedconsistencyguaranteesofS3viaanindependentimplementationwhichalsousedaslightlydierentalgorithm.Evenbackthen(whereitwasonlyaby-productofourevaluationof[3])weobservedremarkablysimilarbehavior.Figure5showsthefullresultsofouroneweekevaluationofS3.Duetothesheernumberoftestrunsand,hence,thedensityofthecurve,itisnotpossibletoseethesawtoothpatternduringtheSAWphasesbutitisstilleasilypossibletodistinguishSAWandLOWphases.Anotherndingwasthattheavailabilityzonesseemdier-entintermsofaccessingthelatestversion.Whileourwriterwasinzonea,thelongestinconsistencywindowlengthwasobservedin28%ofalltestsinzonea.Thesameistrueforzonecwhilezonebhadthemaximumin49%ofalltests9.Thisindicatesthatzonebseemstohaveaslightlypoorerconnectiontotheothertwozones,e.g.,bybeinglocatedinadierentbuilding.Furthermore,regardinglocationswecouldnotseedier-encesbetweenthezones:Theyalldidthesamesawtoothwaveandhadtheirmaximaandminimaattheexactsametimeonlytheamplitudeswereslightlydierentwhichcre-atestheresultsfromthelastparagraph.Wealsotestedourresultsforviolationsofmonotonicreadconsistency.Fromatotalof353,357,884reads42,565,840orabout12%ofallrequestsviolatedmonotonicreadconsis-tency[20].Inexchange,weobservedanavailabilityofmorethaneightnines(99.9999997%{onlyonerequestreturnedanerror).6.DISCUSSIONInsummary,weobservedanunexpected,veryinterest-ingconsistencybehaviorofAmazonS3,buthavesofarnotbeenabletocomeupwithasatisfyingexplanationofourexperimentalndings.PossibleexplanationscouldbecachingeectsormeasurementstocounterDDoSattacts 9Thetotalisnot100%asforabout5%ofallteststwozonesobservedthesameinconsistencywindow. counteredstrangeperiodicities,namelyourso-calledSAWandLOWphaseswhichalternateapproximatelytwiceaday.Furthermore,wedescribedthesawtoothwave-likebehaviorofS3duringSAWphasesbeforediscussingpotentialexpla-nations.Ourapproachofgeographicallydistributedreaderscom-binedwithawritertsintocurrentresearchregardingbench-markingofdistributeddatastoresaswellassystemsbuildingontopofthat.Ourresultsprovideconcretedatathatservesascriteriaforanapplicationdevelopertodeterminewhetheraneventualconsistencydatastoreprovidesacceptablecon-sistencyguarantees.Infutureendeavors,wewilltrytodeterminedependen-ciesbetweenlesonS3,e.g.,howperiodicitiesofleswithinthesamebucketoracrossmultiplebucketscorrelate.Fur-thermore,wearecurrentlybenchmarkingApacheCassandraandtheGoogleAppEnginedatastore.Weplantopublishtheseresultsaswellastoextendoureortstoadditionalstoragesystemsinafollow-uppaper.Finally,YuandVahdat[22]aswellassimilarmodelsknowotherconsistencydimensionsbeyondstaleness,e.g.,ordererror.Weareinvestigatingmeanstoalsomeasurethesedimensions.9.REFERENCES[1]E.Anderson,X.Li,M.Shah,J.Tucek,andJ.Wylie.Whatconsistencydoesyourkey-valuestoreactuallyprovide.InProceedingsoftheSixthWorkshoponHotTopicsinSystemDependability(HotDep),2010.[2]R.Baldoni,A.Corsaro,L.Querzoni,S.Scipioni,andS.Tucci-Piergiovanni.Anadaptivecoupling-basedalgorithmforinternalclocksynchronizationoflargescaledynamicsystems.InProceedingsofthe2007OTMConfederatedinternationalconferenceonOnthemovetomeaningfulinternetsystems-VolumePartI,pages701{716.Springer-Verlag,2007.[3]D.Bermbach,M.Klems,M.Menzel,andS.Tai.Metastorage:Afederatedcloudstoragesystemtomanageconsistency-latencytradeos.InProceedingsofthe4thInternationalConferenceonCloudComputing(IEEECloud2011).IEEE,2011.[4]B.Cooper,R.Ramakrishnan,U.Srivastava,A.Silberstein,P.Bohannon,H.Jacobsen,N.Puz,D.Weaver,andR.Yerneni.PNUTS:Yahoo!'shosteddataservingplatform.ProceedingsoftheVLDBEndowment,1(2):1277{1288,2008.[5]B.Cooper,A.Silberstein,E.Tam,R.Ramakrishnan,andR.Sears.Benchmarkingcloudservingsystemswithycsb.InProceedingsofthe1stACMsymposiumonCloudcomputing,pages143{154.ACM,2010.[6]G.DeCandia,D.Hastorun,M.Jampani,G.Kakulapati,A.Lakshman,A.Pilchin,S.Sivasubramanian,P.Vosshall,andW.Vogels.Dynamo:amazon'shighlyavailablekey-valuestore.InProc.SOSP,2007.[7]A.FoxandE.Brewer.Harvest,yield,andscalabletolerantsystems.InProceedingsofthe7thWorkshoponHotTopicsinOperatingSystems,1999,pages174{178.IEEE,2002.[8]S.Ghemawat,H.Gobio,andS.Leung.TheGooglelesystem.ACMSIGOPSOperatingSystemsReview,37(5):29{43,2003.[9]M.Klems,M.Menzel,andR.Fischer.Consistencybenchmarking:Evaluatingtheconsistencybehaviorofmiddlewareservicesinthecloud.InProceedingsofthe8thInternationalConferenceonServiceOrientedComputing(ICSOC).Springer,Dec.2010.[10]D.Kossmann,T.Kraska,andS.Loesing.Anevaluationofalternativearchitecturesfortransactionprocessinginthecloud.InProceedingsofthe2010internationalconferenceonManagementofdata,pages579{590.ACM,2010.[11]T.Kraska,M.Hentschel,G.Alonso,andD.Kossmann.ConsistencyRationingintheCloud:Payonlywhenitmatters.ProceedingsoftheVLDBEndowment,2(1):253{264,2009.[12]J.Kubiatowicz,D.Bindel,Y.Chen,S.Czerwinski,P.Eaton,D.Geels,R.Gummadi,S.Rhea,H.Weatherspoon,C.Wells,etal.Oceanstore:Anarchitectureforglobal-scalepersistentstorage.ACMSIGARCHComputerArchitectureNews,28(5):190{201,2000.[13]A.LakshmanandP.Malik.Cassandra:adecentralizedstructuredstoragesystem.ACMSIGOPSOperatingSystemsReview,44(2):35{40,2010.[14]M.Menzel,M.Schoenherr,andS.Tai.(mc2)2:criteria,requirementsandasoftwareprototypeforcloudinfrastructuredecisions.Software:PracticeandExperience,2011.[15]ntp.org.NTPAlgorithm.http://www.ntp.org/ntpfaq/NTP-s-algo.htm(accessedonSeptember6,2011).[16]S.Sakr,L.Zhao,H.Wada,andA.Liu.Clouddbautoadmin:Towardsatrulyelasticcloud-baseddatastore.InThe9thIEEEInternationalConferenceonWebServices(ICWS2011),WashingtonDC,USA,July2011.[17]M.Satyanarayanan,J.Kistler,P.Kumar,M.Okasaki,E.Siegel,andD.Steere.Coda:Ahighlyavailablelesystemforadistributedworkstationenvironment.IEEETransactionsoncomputers,pages447{459,1990.[18]A.S.TanenbaumandM.V.Steen.DistributedSystems-PrinciplesandParadigms.PearsonEducation,UpperSaddleRiver,NJ,2ndedition,2007.[19]D.Terry,M.Theimer,K.Petersen,A.Demers,M.Spreitzer,andC.Hauser.Managingupdatecon ictsinBayou,aweaklyconnectedreplicatedstoragesystem.ACMSIGOPSOperatingSystemsReview,29(5):172{182,1995.[20]W.Vogels.Eventuallyconsistent.Queue,6:14{19,October2008.[21]H.Wada,A.Fekete,L.Zhao,K.Lee,andA.Liu.Dataconsistencypropertiesandthetradeosincommercialcloudstorages:theconsumers'perspective.In5thbiennialConferenceonInnovativeDataSystemsResearch,CIDR,volume11,2011.[22]H.YuandA.Vahdat.Designandevaluationofaconit-basedcontinuousconsistencymodelforreplicatedservices.ACMTransactionsonComputerSystems(TOCS),20(3):239{282,2002.[23]L.Zhao,A.Liu,andJ.Keung.Evaluatingcloudplatformarchitecturewiththecareframework.In2010AsiaPacicSoftwareEngineeringConference,pages60{69.IEEE,2010.