/
Exponentially Decayed Aggregates on Data Streams Graha Exponentially Decayed Aggregates on Data Streams Graha

Exponentially Decayed Aggregates on Data Streams Graha - PDF document

ellena-manuel
ellena-manuel . @ellena-manuel
Follow
472 views
Uploaded On 2015-05-25

Exponentially Decayed Aggregates on Data Streams Graha - PPT Presentation

attcom Iowa State University sntiastateedu Abstract In a massive stream of sequential events such as stock feeds sensor readings or IP traf64257c measurements tuples pertaining to recent events are typically more important than older ones It is impo ID: 74063

attcom Iowa State University sntiastateedu

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Exponentially Decayed Aggregates on Data..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

III.EXPONENTIALLYDECAYEDQUANTILESWedescribeourapproachforcomputingquantilesontimes-tampordereddataunderexponentialdecay,whichistherstdeterministicalgorithmforthisproblem.Givenaparameter01,theq-digest[1]summarizesthefrequencydistributionfiofamultisetdenedbyastreamofNitemsdrawnfromthedomain[0:::W�1].Theq-digestcanbeusedtoestimatetherankr(q)ofanitemq,whichisdenedasthenumberofitemsdominatedbyq,i.e.,r(q)=Pifi.Thedatastructuremaintainsanappropriatelydenedsetofdyadicrangesoftheform[i2j:::(i+1)2j�1]andtheirassociatedcounts.Itiseasytoseethatanarbitraryrangeofintegers[a:::b]canbeuniquelypartitionedintoatmost2log(b�a)dyadicranges,withatmost2dyadicrangesofeachlength.Theq-digesthasthefollowingproperties:Eachrange,countpair(r;c(r))hasc(r)N log2W,unlessrrepresentsasingleitem.Givenaranger,denoteitsparentrangeaspar(r),anditsleftandrightchildrangesasleft(r)andright(r)respectively.Forevery(r;c(r))pair,wehavethatc(par(r))+c(left(par(r)))+c(right(par(r)))N log2W.Iftherangerispresentinthedatastructure,thentherangepar(r)isalsopresentinthedatastructure.Givenquerypointq2[0:::W�1],wecancomputeanestimateoftherankofq,denotedby^r(q),asthesumofthecountsofallrangestotheleftofq,i.e.^r(q)=P(r=[l;h];c(r));hc(r).Thefollowingaccuracyguaranteecanbeshownfortheestimateoftherank:^r(q)r(q)^r(q)+N.Similarly,givenaquerypointqonecanestimatefq,thefrequencyofitemqas^fq=^r(q+1)�^r(q),withthefollowingaccuracyguarantee:^fq�Nfq^fq+N.Theq-digestcanbemaintainedinspaceO(logW )[1],[7].Updatestoaq-digestcanbeperformedin(amortized)timeO(loglogW),bybinarysearchingtheO(logW)dyadicrangescontainingthenewitemtondtheappropriateplacetorecorditscount;andqueriestakeO(logW ).Nowobservethat:(1)Theq-digestcanbemodiedtoacceptupdateswitharbitrary(i.e.fractional)non-negativeweights;and(2)multiplyingallcountsinthedatastructurebyaconstant givesanaccuratesummaryoftheinputscaledby .Itiseasytocheckthatthepropertiesofthedatastructurestillholdafterthesetransformations,e.g.thatthesumofthecountsisD,thesumofthe(possiblyscaled)inputweights;nocountforarangeexceedsD logU;etc.Thusgivenanitemarrivalofhxi;tiiattimet,wecancreateasummaryoftheexponentiallydecayeddata.Lett0bethelasttimethedatastructurewasupdated;wemultiplyeverycountinthedatastructurebythescalarexp(�(t�t0))sothatitreectsthecurrentdecayedweightsofallitems,andthenupdatetheq-digestwiththeitemxiwithweightexp(�(t�ti)).Notethatthismaybetimeconsuming,sinceitaffectseveryentryinthedatastructure.Wecanbemore“lazy”bytrackingD,thecurrentdecayedcount,exactly,andkeepingatimestamptroneachcounterc(r)denotingthelasttimeitwastouched.Wheneverwerequirethecurrentvalueofranger,wecanmultiplyitbyexp(�(t�tr)),andupdatetr AlgorithmIV.1:HEAVYHITTERUPDATE(xi;wi;ti;) Input:itemxi;timestampti;weightwi;decayfactorOutput:Currentestimateofitemweightif9j:item[j]=xi;thenj item�1(xi)elsej argmink(count[k]);item[j] xi;count[j] count[j]+wiexp(ti)return(count[j]exp(�ti)) Fig.1.PseudocodeforHeavyHitterswithexponentialdecaytot.Thisensuresthattheasymptoticspaceandtimecostsofmaintaininganexponentiallydecayedq-digestareasbefore.Toseethecorrectnessofthisapproach,letS(r)denotethesubsetofinputitemswhichthealgorithmisrepresentingbytheranger:whenthealgorithmprocessesanewupdatehxi;tiiandupdatesaranger,we(notionally)setS(r)=S(r)[i;whenthealgorithmmergesaranger0togetherintorangerbyaddingthecountof(thechildrange)r0intothecountofr(theparent),wesetS(r)=S(r)[S(r0),andS(r0)=;(sincer0hasgivenupitscontents).Ouralgorithmmaintainsc(r)=Pi2S(r)wiexp(�(t�ti));itiseasytocheckthateveryoperationwhichmodiesthecounts(addinganewitem,mergingtworangecounts,applyingthedecayfunctions)maintainsthisinvariant.Inlinewiththeoriginalq-digestalgorithm,everyitemsummarizedinS(r)isamemberoftheranger,i.e.i2S(r))xi2r,andatanytimeeachtupleifromtheinputisrepresentedinexactlyoneranger.Toestimatethedecayedrankofxattimet,r(x;t)=Pi;xixwiexp((t�ti)),wecompute^r(x;t)=Pr=[l:::h];hxc(r).Bytheaboveanalysisofc(r),wecorrectlyincludeallitemsthataresurelylessthanx,andomitallitemsthataresurelygreaterthanx.Theuncertaintydependsonlyontherangescontainingx,andthesumoftheserangesisatmostPrc(r)=D.Thisallowstoquicklynda-quantilewiththedesirederrorboundsbybinarysearchingforxwhoseapproximaterankisD.Insummary,Theorem1:Underaxedexponentialdecayfunctionexp(�(t�ti)),wecananswer-approximatedecayedquantilequeriesinspaceO(1 logU)andtimeperupdateO(loglogU).QueriestaketimeO(logU ).IV.EXPONENTIALLYDECAYEDHEAVYHITTERSPriorworkbyManjhietal.[3]computedHeavyHittersontimestampordereddataunderexponentialdecaybymodifyingalgorithmsfortheproblemwithoutdecay.Wetakeasimilartack,butourapproachmeansthatwecanalsoeasilyaccom-modateout-of-orderarrivals,whichisnotthecasein[3].Arstobservationisthatwecanusethesame(exponentiallydecayed)q-digestdatastructuretoalsoanswerheavyhittersqueries,sincethedatastructureguaranteeserroratmostDinthecountofanysingleitem;itisstraightforwardtoscanthedatastructuretondandestimateallpossibleheavyhittersintimelinearinthedatastructure'ssize.ThusTheorem1alsoappliestoheavyhitters.However,wecanreducetherequired