/
1IntroductionInrecentyears,thedata-streammodelhasenjoyedsignicantatte 1IntroductionInrecentyears,thedata-streammodelhasenjoyedsignicantatte

1IntroductionInrecentyears,thedata-streammodelhasenjoyedsigni cantatte - PDF document

kittie-lecroy
kittie-lecroy . @kittie-lecroy
Follow
352 views
Uploaded On 2016-03-16

1IntroductionInrecentyears,thedata-streammodelhasenjoyedsigni cantatte - PPT Presentation

betweenboostingalgorithmsandlogisticlossandsubsequentlyoveraseriesofpapersLa ertyetal1997La erty1999KivinenWarmuth1999Collinsetal2002thestudyofBregmandivergencesandinformationgeometryhas ID: 257883

betweenboostingalgorithmsandlogisticloss andsubsequentlyoveraseriesofpapers(La ertyetal. 1997;La erty 1999;Kivinen&Warmuth 1999;Collinsetal. 2002) thestudyofBregmandivergencesandinformationgeometryhas

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "1IntroductionInrecentyears,thedata-strea..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1IntroductionInrecentyears,thedata-streammodelhasenjoyedsigni cantattentionbecauseoftheneedtoprocessmassivedatasets(e.g.Henzingeretal.,1999;Alonetal.,1999;Feigenbaumetal.,2002).Astreamingcomputationisasublinearspacealgorithmthatreadstheinputinsequentialorderandanyitemnotexplicitlyrememberedisinaccessible.Afundamentalprobleminthemodelistheestimationofdistancesbetweentwoobjectsthataredeterminedbythestream,e.g.,thenetworktracmatricesattworouters.Estimationofdistancesallowsustoconstructapproximaterepresentations,e.g.,histograms,wavelets,Fouriersummaries,orequivalently, ndmodelsoftheinputstream,sincethisproblemreducesto ndingthe\closest"representationinasuitableclass.Inthispaper,theobjectsofinterestareempiricalprobabilitydistributionsde nedbyastreamofupdatesasfollows.De nition1.ForadatastreamS=ha1;:::;amiwhereai2fp;qg[n]wede neempiricaldistributionspandqasfollows.Letm(p)i=jfj:aj=hp;iigj,m(p)=jfj:aj=hp;igjandpi=m(p)i=m(p).Similarlyforq.OneofthecornerstonesinthetheoryofdatastreamalgorithmshasbeentheresultofAlonetal.(1999).Theyshowedthatitispossibletoestimate`2(p;q):=kp�qk2(theEuclideandistance)uptoa(1+)factorusingonlypoly(�1;logn)space.Thealgorithmcan,inretrospect,beviewedintermsofthefamousembeddingresultofJohnson&Lindenstrauss(1984).ThisresultimpliesthatforanytwovectorspandqandanknmatrixAwhoseentriesareindependentNormal(0;1)randomvariables(scaledappropriately),(1+)�1`2(p;q)`2(Ap;Aq)(1+)`2(p;q)withhighprobabilityforsomek=poly(�1;logn).Alon,Matias,andSzegedydemonstratedthatan\e ective"Acanbestoredinsmallspaceandcanbeusedtomaintainasmall-space,updateablesummary,orsketch,ofpandq.The`2distancebetweenpandqcanthenbeestimatedusingonlythesketchesofpandq.WhileBrinkman&Charikar(2003)provedthattherewasnoanalogoftheJohnson-Lindenstraussresultfor`1,Indyk(2000)demonstratedthat`1(p;q)couldalsobeestimatedinpoly(�1;logn)spacebyusingCauchy(0;1)randomvariablesratherthanNormal(0;1)randomvariables.Theresultsextendedtoall`p-measureswith0p2usingstabledistributions.Overasequenceofpapers(Saks&Sun,2002;Chakrabartietal.,2003;Cormodeetal.,2003;Bar-Yossefetal.,2004;Indyk&Woodru ,2005;Bhuvanagirietal.,2006;Cormode&Ganguly,2007)`pandHammingdistanceshavebecomewellunderstood.Concurrentlyseveralmethodsofcreatingsummaryrepresentationsofstreamshavebeenproposed(Broderetal.,2000;Charikaretal.,2002;Cormode&Muthukrishnan,2005)foravarietyofapplications;intermsofdistancestheycanbeadaptedtocomputetheJaccardcoecient(symmetricdi erenceoverunion)fortwosets.Oneoftheprincipalmotivationsofthisworkistocharacterizethedistancesthatcanbesketched.2 betweenboostingalgorithmsandlogisticloss,andsubsequentlyoveraseriesofpapers(La ertyetal.,1997;La erty,1999;Kivinen&Warmuth,1999;Collinsetal.,2002),thestudyofBregmandivergencesandinformationgeometryhasbecomethemethodofchoiceforstudyingexponentiallossfunctions.Theconnectionbetweenlossfunctionsandf-divergencesareinvestigatedmorerecentlybyNguyenetal.(2005).De nition3(DecomposableBregmanDivergences).Letpandqbetwon-pointdistributions.AstrictlyconvexfunctionF:(0;1]!RgivesrisetoaBregmandivergence,BF(p;q)=Xi2[n]�F(pi)�F(qi)�(pi�qi)F0(qi):PerhapsthemostfamiliarBregmandivergenceis`22withF(z)=z2.TheKullback{LeiblerdivergenceisalsoaBregmandivergencewithF(z)=zlogz,andtheItakura{SaitodivergenceF(z)=�logz.La ertyetal.(1997)suggestF(z)=�z + z� +1for 2(0;1),F(z)=z � z+ �1for 0.TheprincipaluseofBregmandivergencesisin ndingoptimalmodels.Givenadistributionqweareinterestedin ndingapthatbestmatchesthedata,andthisisposedastheconvexoptimizationproblemminpBF(p;q).ItiseasytoverifythatanypositivelinearcombinationofBregmandivergencesisaBregmandivergenceandthattheBregmanballsareconvexinthe rstargumentbutoftennotinthesecond.Thisistheparticularappealofthetechnique,thatthedivergencedependsonthedatanaturallyandthedivergenceshavecometobeknownasInformationGeometrytechniques.Furthermore,thereisanaturalconvexdualitybetweentheoptimumrepresentationpunderBF,andthedivergenceBF.ThisconnectiontoconvexoptimizationisoneofthemanyreasonsfortheemergingheavyuseofBregmandivergencesinthelearningliterature.Giventhatwecanestimate`1and`2distancesbetweentwostreamsinsmallspace,itisnaturaltoaskwhichotherf-divergencesandBregmandivergencesaresketchable?OurContributions:Inthispaperwetakeseveralstepstowardsacharacterizationofthedistancesthatcanbesketched.Our rstresults,inSection3,arenegativeandhelpusunderstandwhythe`1and`2distancesarespecialamongthefandBregmandivergences.WeprovetheShiftInvariantTheoremthatcharacterizesalargefamilyofdistancesthatcannotbeapproximatedmultiplicativelyinthedata-streammodel.Thistheorempertainstodecomposabledistances,i.e.,distancesd:RnRn!R+forwhichthereexistsa:RR!R+suchthatd(x;y)=Pi2[n](xi;yi).Thetheoremsuggestthatunless(xi;yi)isafunctionofxi�yithemeasuredcannotbesketched.Forallf-divergenceforwhichfistwicedi erentiableandf00isstrictlypositive,nopolynomial4 Lemma4.Letfbeareal-valuedfunctionthatisconvexon(0;1)andsatis esf(1)=0.Thenthereexistsareal-valuedfunctiongthatisconvexon(0;1)andsatis esg(1)=0suchthat1.Df=Dg.2.gispositiveandiffisdi erentiableat1theng0(1)=0.3.IfDfisboundedtheng(0)=limu!0g(u)andg(0)=limu!0g(u)exists.Proof.Forp=(1=2;1=2)andq=(0;1),Df(p;q)=(f(0)+f(2))=2andDf(q;p)=0:5limu!0uf(1=u)+f(0:5):Hence,ifDfisboundedthenf(0)=limu!0f(u)andf(0)=limu!0f(u)=limu!0uf(1=u)exist.Letc=limu!1�f(1)�f(u) 1�u.Iffisdi erentiablethenc=f0(1).Otherwise,thislimitstillexistsbecausefisconvexandde nedon(0;1).Theng(u)=f(u)�c(u�1)satis esthenecessaryconditions. Forexample,theHellingerdivergencecanberealizedbyeitherf(u)=(p u�1)2orf(u)=2�2p u.Henceforth,weassumefisnon-increasingintherange[0;1]andnon-decreasingintherange[1;1).Thenextlemmashowsthat,ifwearewillingtotolerateanadditiveapproximation,wemaymakecertainassumptionsaboutthederivativeoff.Thisisachievedbyapproximatingfbyastraightlineforverysmallandverylargevalues.Lemma5.GivenaboundedDfwithfdi erentiable(w.l.o.g.,fisunimodalandminimizedat1)and2(0;1),letu0()=maxfu2(0;1]:f(u)=f(0)1�;f(u)=f(0)1�gandde neg:g(u)=8�&#x]TJ ;� -1; .63; Td;&#x[000;:f(u)foru2(u0;1=u0)f(0)�u(f(0)�f(u0))=u0foru2[0;u0]uf(0)�(f(0)�f(u0))=u0foru2[1=u0;1)Then,Dg(p;q)(1�)Df(p;q)Dg(p;q)andmaxujg0(u)jmax(f(0)=u0;f(0))andmaxujg0(u)jmax(f(0)=u0;f(0)):Proof.Becausef;f;g;garenon-increasingintherange[0;1],forallu2[0;u0],1g(u) f(u)f(0) f(u)f(0) f(u0)and1g(u) f(u)f(0) f(u)f(0) f(u0):(1)6 O((a+b+c)n)over[5n=4]requires (n)space.Thisremainstrueevenifthealgorithmmaytakeaconstantnumberofpassesoverthestream.Thefactornontheright-handsideofEqn.2isonlynecessaryifwewishtoprovea (n)spacelowerboundandtherebyruleoutsub-linearspacealgorithms.Inparticular,ifnisreplacedbysomewnthenthelowerboundwouldbecome (w).However,theaboveformulationwillbesucientforthepurposesofprovingresultsontheestimationofinformationdivergences.Proof.TheproofisbyareductionfromthecommunicationcomplexityoftheSet-Disjointnessprob-lem.Aninstanceofthisproblemconsistsoftwobinarystrings,x;y2f0;1gnsuchthatPixi=Piyi=n=4.Weconsidertwoplayers,AliceandBob,suchthatAliceknowsthestringxandBobknowsthestringy.AliceandBobtaketurnstosendmessagestoeachotherwiththegoalofdeterminingifxandyaredisjoint,i.e.,xy=0(wheretheinnerproductistakenoverthereals).Itisknownthatdeterminingifxy=0withprobabilityatleast3=4requires (n)bitstobecommunicated(Razborov,1992).However,supposethatthereexistsastreamingalgorithmAthattakesPpassesoverastreamandusesWworkingmemoryto -approximated(p;q)withprobability3=4.Wewillshowthatthisalgorithmgivesrisetoa(2P�1)-roundprotocolforSet-DisjointnessthatonlyrequiresO(PW)bitstobecommunicatedandthereforeW= (n=P).Wewillassumethat(a=t;(a+c)=t)((a+c)=t;a=t).If(a=t;(a+c)=t)((a+c)=t;a=t)thentheprooffollowsbyreversingtherolesofthepandqthatwenowde ne.Considerthemulti-sets,SA(x)=[i2[n]faxi+b(1�xi)copiesoffhp;ii,hq;iigg[[i2[n=4]fbcopiesoffhp;i+ni;hq;i+niggSB(y)=[i2[n]fcyicopiesofhq;iig[[i2[n=4]fccopiesofhp;i+nig:Thisde nesthefollowingfrequencies:m(p)i=8���&#x]TJ ;� -1; .63; Td;&#x[000;&#x]TJ ;� -1; .63; Td;&#x[000;&#x]TJ ;� -1; .63; Td;&#x[000;:aifxi=1andi2[n]bifxi=0andi2[n]b+cifni5n=4andm(q)i=8&#x-278;&#x-278;&#x-278;&#x-278;&#x-278;&#x-278;&#x-278;&#x-278;&#x-278;&#x]TJ ;� -1; .63; Td;&#x[000;&#x]TJ ;� -1; .63; Td;&#x[000;&#x]TJ ;� -1; .63; Td;&#x[000;&#x]TJ ;� -1; .63; Td;&#x[000;&#x]TJ ;� -1; .63; Td;&#x[000;&#x]TJ ;� -1; .63; Td;&#x[000;&#x]TJ ;� -1; .63; Td;&#x[000;&#x]TJ ;� -1; .63; Td;&#x[000;&#x]TJ ;� -1; .63; Td;&#x[000;:aif(xi;yi)=(1;0)andi2[n]bif(xi;yi)=(0;0)andi2[n]a+cif(xi;yi)=(1;1)andi2[n]b+cif(xi;yi)=(0;1)andi2[n]bifni5n=4:Consequently,d(p;q)=(xy)(a=t;(a+c)=t)+(n=4�xy)(b=t;(b+c)=t)+(n=4)((b+c)=t;b=t);8 forsome 2[0;1=b]byTaylor'sTheorem.Sincef(1)=f0(1)=0andf00(t)iscontinuousatt=1thisimpliesthatforsucientlylargen,f00(1+ )f00(1)+1andso,(b=t;(b+c)=t)f00(1)+1 2tb=f00(1)+1 2f(2)bt�1f(2)8 2n(a=t;(a+c)=t):Similarlywecanshowthatforsucientlylargen,((b+c)=t;b=t)8(a=t;(a+c)=t)=( 2n):Then,appealingtoTheorem7wegettherequiredresult. Corollary9(BregmanDivergences).GivenaBregmandivergenceBF,ifFistwicedi erentiableandthereexists;z0�0suchthat,80z2z1z0;F00(z1) F00(z2)z1 z2or80z2z1z0;F00(z1) F00(z2)z2 z1thennopolynomialfactorapproximationofBFispossibleino(n)bitsofspace.Thisconditione ectivelystatesthatF00(z)vanishesordivergesmonotonically,andpolynomiallyfast,asz!0.Notethatfor`22,whichcanbesketched,F(z)=z2andthereforeF00isconstanteverywhere.Proof.BytheMean-ValueTheorem,foranyt;r2N,thereexists (r)2[0;1]suchthat,(r=t;(r+1)=t)+(r=t+1=t;r=t)=t�1(F0(r=t+1=t)�F0(r=t))=t�2F00((r+ (r))=t):Therefore,foranya;b2N;c=1andt=an=4+bn+n=2,max��a t;a+c t;�a+c t;a t �b+c t;b t+�b t;b+c t1 2F00((a+ (a))=t) F00((b+ (b))=t):If80z2z1z0;F00(z1)=F00(z2)(z1=z2)thenseta=( 2n)1=andb=1where isanarbitrarypolynomialinn.If80z2z1z0;F00(z1)=F00(z2)(z2=z1)thenseta=1andb=( n)1=.InbothcaseswededucethattheRHSofEqn.3isgreaterthan 2n=4.Hence,appealingtoTheorem7,wegettherequiredresult. 4AdditiveApproximationsInthissectionwefocusonadditiveapproximations.Asmentionedearlier,theprobabilityofmisclassi cationusingratiotestsisoftenboundedby2�Df,forcertainDf.Hence,anadditiveapproximationtranslatestoamultiplicative2factorforcomputingtheerrorprobability.Ourgoalisthecharacterizationofdivergencesthatcanbeapproximatedadditively.We rstpresentageneralalgorithmicresultbasedonanextensionofatechnique rstusedbyAlon10 Toseethisconsiderthesub-streamconsistingoftheelementsoftheformh;ii,e.g.,hp;ii;hq;ii;hq;ii;hp;ii;hq;ii;hp;ii:andexpandE[X(r;s)jk=i]asfollows:E[X(r;s)jk=i]= �Xi2[m(q)]X(2m(q)+i;3m(p))+Xi2[m(p)]X(2m(q);2m(p)+i)++Xi2[m(p)]X(2m(q);m(p)+i)+Xi2[m(q)]X(m(q)+i;m(p))+Xi2[m(p)]X(m(q);i)+Xi2[m(q)]X(i;0)=2m 0BBBBBB@(2=m(q);3=m(q))�(2=m(q);2=m(q))+(2=m(p);2=m(q))�(2=m(q);1=m(q))+(2=m(p);1=m(q))�(1=m(q);1=m(q))+(1=m(p);1=m(q))�(1=m(q);0=m(q))+(1=m(p);0=m(q))�(0=m(q);0=m(q))1CCCCCCA=2(pi;qi) pi+qi:where =1 m(p)im(q)+m(p)m(q)i.ThereforeE[X(r;s)]=Pi(pi;qi)asrequired.Furthermore,jX(r;s)j2max(maxx2[r�1 m;r m] @ @x(x;s=m) ;maxy2[s�1 m;s m] @ @y(r=m;y) ):Hence,byanapplicationoftheCherno bound,averagingO(�2log�1)independentbasicestimatorsgivesan(;)-additive-approx. Wenextprovealowerboundonthespacerequiredforadditiveapproximationbyanysingle-passalgorithm.Theproofusesareductionfromtheone-waycommunicationcomplexityoftheGap-Hammingproblem(Woodru ,2004).Itiswidelybelievedthatasimilarlower-boundexistsformulti-roundcommunication(e.g.McGregor,2007,Question10(R.Kumar))and,ifthisisthecase,itwouldimplythatthelower-boundbelowalsoappliestoalgorithmsthattakeaconstantnumberofpassesoverthedata.Theorem11.Any(;1=4)-additive-approximationofd(p;q)requires (�2)bitsofspaceif,9a;b�0;8x;(x;0)=ax;(0;x)=bx;and(x;x)=0:Proof.TheproofisbyareductionfromthecommunicationcomplexityoftheGap-Hammingprob-lem.Aninstanceofthisproblemconsistsoftwobinarystrings,x;y2f0;1gnsuchthatPixi=12 Forexample,()=O(1)forTriangleand()=O(�1)forHellinger.Thealgorithmdoesnotneedtoknowm(p)orm(q)inadvance.Proof.WeappealtoTheorem10andnotethat,maxx;y2[0;1] @ @x(x;y) + @ @y(x;y) =maxx;y2[0;1]� f(y=x)�(y=x)f0(y=x) + f0(y=x) 2maxu0� f0(u) + f0(u) :TheresultfollowsbyappealingtoByLemma5,wemayboundthederivativesoffandfintermsoftheadditiveapproximationerror.Thisgivestherequiredresult. WecomplementTheorem13withthefollowingresultwhichfollowsfromTheorems11and12.Theorem14.Any(;1=4)-additive-approximationofanunboundedDfrequires (n)bitsofspace.Thisappliesevenifoneofthedistributionsisknowntobeuniform.Any(;1=4)-additive-approximationofaboundedDfrequires (�2)bitsofspace.4.2AdditiveApproximationforBregmandivergencesInthissectionweproveapartialcharacterizationoftheBregmandivergencesthatcanbeadditivelyapproximated.Theorem15.Thereexistsaone-pass,O(�2log�1(logn+logm))-space,(;)-additive-approx.ofaBregmandivergenceifFandF00areboundedintherange[0;1].Thealgorithmdoesnotneedtoknowm(p)orm(q)inadvance.Proof.WeappealtoTheorem10andnotethat,maxx;y2[0;1] @ @x(x;y) + @ @y(x;y) =maxx;y2[0;1]� F0(x)�F0(y) +jx�yjF00(y):WemayassumethisisconstantbyconvexityofFandtheassumptionsofthetheorem.Theresultfollows. ThenexttheoremfollowsimmediatelyfromTheorem12.Theorem16.IfF(0)orF0(0)isunboundedthenan(;1=4)-additive-approx.ofBFrequires (n)bitsofspaceevenifoneofthedistributionsisknowntobeuniform.5ConclusionsandOpenQuestionsWepresentedapartialcharacterizationoftheinformationdivergencesthatcanbemultiplicativelyapproximatedinthedatastreammodel.Thischaracterizationwasbasedonageneralresultthat14 Breiman,L.(1999).Predictiongamesandarcingalgorithms.NeuralComputation,11(7),1493{1517.Brinkman,B.,&Charikar,M.(2003).Ontheimpossibilityofdimensionreductionin`1.InIEEESymposiumonFoundationsofComputerScience,(pp.514{523).Broder,A.Z.,Charikar,M.,Frieze,A.M.,&Mitzenmacher,M.(2000).Min-wiseindependentpermutations.J.Comput.Syst.Sci.,60(3),630{659.Chakrabarti,A.,Cormode,G.,&McGregor,A.(2007).Anear-optimalalgorithmforcomputingtheentropyofastream.InACM-SIAMSymposiumonDiscreteAlgorithms,(pp.328{335).Chakrabarti,A.,Khot,S.,&Sun,X.(2003).Near-optimallowerboundsonthemulti-partycommunicationcomplexityofsetdisjointness.InIEEEConferenceonComputationalComplexity,(pp.107{117).Charikar,M.,Chen,K.,&Farach-Colton,M.(2002).Findingfrequentitemsindatastreams.InInternationalColloquiumonAutomata,LanguagesandProgramming,(pp.693{703).Collins,M.,Schapire,R.E.,&Singer,Y.(2002).Logisticregression,AdaBoostandBregmandistances.MachineLearning,48(1-3),253{285.Cormode,G.,Datar,M.,Indyk,P.,&Muthukrishnan,S.(2003).Comparingdatastreamsusinghammingnorms(howtozeroin).IEEETrans.Knowl.DataEng.,15(3),529{540.Cormode,G.,&Ganguly,S.(2007).Onestimatingfrequencymomentsofdatastreams.InInternationalWorkshoponRandomizationandApproximationTechniquesinComputerScience.Cormode,G.,&Muthukrishnan,S.(2005).Animproveddatastreamsummary:thecount-minsketchanditsapplications.J.Algorithms,55(1),58{75.Cover,T.M.,&Thomas,J.A.(1991).ElementsofInformationTheory.WileySeriesinTelecom-munications.NewYork,NY,USA:JohnWiley&Sons.Csiszar,I.(1991).Whyleastsquaresandmaximumentropy?Anaxiomaticapproachtoinferenceforlinearinverseproblems.Ann.Statist.,(pp.2032{2056).Feigenbaum,J.,Kannan,S.,Strauss,M.,&Viswanathan,M.(2002).AnapproximateL1di erencealgorithmformassivedatastreams.SIAMJournalonComputing,32(1),131{151.Friedman,J.,Hastie,T.,&Tibshirani,R.(2000).Additivelogisticregression:astatisticalviewofboosting.AnnalsofStatistics,28,337{407.Guha,S.,McGregor,A.,&Venkatasubramanian,S.(2006).Streamingandsublinearapproximationofentropyandinformationdistances.InACM-SIAMSymposiumonDiscreteAlgorithms,(pp.733{742).16