ucsdedu Department of Computer Science and Engineering University of California San Diego 9500 Gilman Drive La Jolla CA 92093 Kaushik Sinha kaushiksinhawichitaedu Department of Electrical Engineering and Computer Science Wichita State University 1845 ID: 5772 Download Pdf
Exact Nearest Neighbor Algorithms Sabermetrics One of the best players ever .310 batting average 3,465 hits 260 home runs 1,311 RBIs 14x All-star 5x World Series winner Who is the next Derek Jeter? Derek Jeter
geurtsulgacbe Gilles Louppe glouppeulgacbe Department of Electrical Engineering and Computer Science GIGAR University of Liege Institut Monte64257ore Sart Tilman B28 B4000 Liege Belgium Editor Olivier Chapelle Yi Chang TieYan Liu Abstract In this pa
Queries in . R-trees. Apostolos. Papadopoulos . and . Yannis. . Manolopoulos. Presenter: Uma . Kannan. Contents. Introduction. Spatial . data Management Research . Spatial . Access Methods . Research.
Neighbor. Search with Keywords. Abstract. Conventional spatial queries, such as range search and nearest . neighbor. retrieval, involve only conditions on objects' geometric properties. Today, many modern applications call for novel forms of queries that aim to find objects satisfying both a spatial predicate, and a predicate on their associated texts. For example, instead of considering all the restaurants, a nearest .
1 4223 25th Annual Conference on Learning Theory The Best of Both Worlds Stochastic and Adversarial Bandits ebastien Bubeck SBUBECK PRINCETON EDU Department of Operations Research and Financial Engineering Princeton University Princeton NJ USA Aleksa
1 3718 25th Annual Conference on Learning Theory Exact Recovery of SparselyUsed Dictionaries Daniel A Spielman SPIELMAN CS YALE EDU Huan Wang HUAN WANG YALE EDU Department of Computer Science Yale University John Wright JOHNWRIGHT EE COLUMBIA EDU Dep
The goal is to minimize the number of tosses until we identify a coin whose posterior probability of being most biased is at least for a given Under a particular probabilistic model we give an optimal algorithm ie an algorithm that minimizes the ex
com Bin Wu wubinbupteducn Bai Wang wangbaibupteducn Chuan Shi shichuanbupteducn Le Yu yulebuptgmailcom Beijing Key Lab of Intelligent Telecommunication Software and Multimedia Beijing University of Posts and Telecommunications Beijing 100876 China Ed
A player plays a repeated vectorvalued game against Nature and her objective is to have her longterm average reward inside some target set The celebrated results of Blackwell provide a conver gence rate of the expected pointtoset distance if this is
biggiodieeunicait Dept of Electrical and Electronic Engineering University of Cagliari Piazza dArmi 09123 Cagliari Italy and Blaine Nelson blainenelsonwsiiunituebingende Dept of Mathematics and Natural Sciences EberhardKarlsUniversitat Tubingen Sand
Published bypasty-toler
ucsdedu Department of Computer Science and Engineering University of California San Diego 9500 Gilman Drive La Jolla CA 92093 Kaushik Sinha kaushiksinhawichitaedu Department of Electrical Engineering and Computer Science Wichita State University 1845
Download Pdf - The PPT/PDF document "JMLR Workshop and Conference Proceedings..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
DasguptaSinhatree,whichtakesO(n)spaceandanswersquerieswithanapproximationfactorc=1+inO((6=)dlogn)time(Aryaetal.,1998).Insomeoftheseresults,anexponentialdependenceondimensionisevident,andindeedthisisafamiliarblotonthenearestneighborlandscape.Onewaytomitigatethecurseofdimensionalityistoconsidersituationsinwhichdatahavelowintrinsicdimensiondo,eveniftheyhappentolieinRdforddoorinageneralmetricspace.Acommonassumptionisthatthedataaredrawnfromadoublingmeasureofdimensiondo(orequivalently,haveexpansionrate2do);thisisdenedinSection4.1below.Underthiscondition,KargerandRuhl(2002)haveaschemethatgivesexactanswerstonearestneighborqueriesintimeO(23dologn),usingadatastructureofsizeO(23don).Themorerecentcovertreealgorithm(Beygelzimeretal.,2006),whichhasbeenusedquitewidely,createsadatastructureinspaceO(n)andanswersqueriesintimeO(2dologn).Thereisalsoworkthatcombinesintrinsicdimensionandapproximatesearch.Thenavigatingnet(KrauthgamerandLee,2004),givendatafromametricspaceofdoublingdimensiondo,hassizeO(2O(do)n)andgivesa(1+)-approximateanswertoqueriesintimeO(2O(do)logn+(1=)O(do));thecrucialadvantagehereisthatdoublingdimensionisamoregeneralandrobustnotionthandoublingmeasure.Despitetheseandmanyotherresults,therearetwosignicantdecienciesinthenearestneighborliteraturethathavemotivatedthepresentpaper.First,existinganalyseshavesucceededatidentifying,foragivendatastructure,highlyspecicfamiliesofdataforwhichecientexactNNsearchispossible|forinstance,datafromdoublingmeasures|buthavefailedtoprovideamoregeneralcharacterization.Second,thereremainsaclassofnearestneighbordatastructuresthatarepopularandsuccessfulinpractice,butthathavenotbeenanalyzedthoroughly.Thesestructurescombineclassicalk-dtreepartitioningwithrandomizationandoverlappingcells,andarethesubjectofthispaper.1.1.ThreerandomizedtreestructuresforexactNNsearchThek-dtreeisapartitionofRdintohyper-rectangularcells,basedonasetofdatapoints(Bentley,1975).Therootofthetreeisasinglecellcorrespondingtotheentirespace.Acoordinatedirectionischosen,andthecellissplitatthemedianofthedataalongthisdirection(Figure1,left).Theprocessisthenrecursedonthetwonewlycreatedcells,andcontinuesuntilallleafcellscontainatmostsomepredeterminednumbernoofpoints.Whentherearendatapoints,thedepthofthetreeisatmostaboutlog(n=no).Givenak-dtreebuiltfromdatapointsS,thereareseveralwaystoansweranearestneighborqueryq.Thequickestanddirtiestoftheseistomoveqdownthetreetoitsappropriateleafcell,andthenreturnthenearestneighborinthatcell.ThisdefeatistsearchtakestimejustO(no+log(n=no)),whichisO(logn)forconstantno.Theproblemisthatq'snearestneighbormaywelllieinadierentcell,forinstancewhenthedatahappentobeconcentratednearcellboundaries.Consequently,thefailureprobabilityofthisschemecanbeunacceptablyhigh.Overtheyears,somesimpletrickshaveemerged,fromvarioussources,forreducingthefailureprobability.ThesearenicelylaidoutbyLiuetal.(2004),whoshowexperimentallythattheresultingalgorithmsareeectiveinpractice.2 DasguptaSinha Figure3:Threetypesofsplit.Thefractionsrefertoprobabilitymass.issomeconstant,whileischosenuniformlyatrandomfrom[1=4;3=4].unitsphere.Butnow,threesplitpointsarenoted:themedianm(C)ofthedataalongdirectionU,the(1=2)fractilevaluel(C),andthe(1=2)+fractilevaluer(C).Hereisasmallconstant,like0:05or0:1.Theideaistosimultaneouslyentertainamediansplitleft=fx:xUm(C)gright=fx:xUm(C)gandanoverlappingsplit(withthemiddle2fractionofthedatafallingonbothsides)left=fx:xUr(C)gright=fx:xUl(C)g:Inthespilltree(Liuetal.,2004),eachdatapointinSisstoredinmultipleleaves,byfollowingtheoverlappingsplits.Aqueryisthenanswereddefeatist-style,byroutingittoasingleleafusingmediansplits.BoththeRPtreeandthespilltreehavequerytimesofO(no+log(n=no)),butthelattercanbeexpectedtohavealowerfailureprobability,andwewillseethisintheboundsweobtain.Ontheotherhand,theRPtreerequiresjustlinearspace,whilethesizeofthespilltreeisO(n1=(1lg(1+2))).When=0:05,forinstance,thesizeisO(n1:159).Inviewofthesetradeos,weconsiderafurthervariant,whichwecallthevirtualspilltree.Itstoreseachdatapointinasingleleaf,followingmediansplits,andhencehaslinearsize.However,eachqueryisroutedtomultipleleaves,usingoverlappingsplits,andthereturnvalueisitsnearestneighborintheunionoftheseleaves.ThevarioussplitsaresummarizedinFigure3,andthethreetreesusethemasfollows: RoutingdataRoutingqueries RPtree PerturbedsplitPerturbedsplitSpilltree OverlappingsplitMediansplitVirtualspilltree MediansplitOverlappingsplitOnesmalltechnicality:if,forinstance,thereareduplicatesamongthedatapoints,itmightnotbepossibletoachieveamediansplit,orasplitatadesiredfractile.Wewillignorethesediscretizationproblems.4 DasguptaSinha2.ApotentialfunctionforpointcongurationsTomotivatethepotentialfunction,westartbyconsideringwhathappenswhentherearejusttwodatapointsandonequerypoint.2.1.HowrandomprojectionaectstherelativeplacementofthreepointsConsideranyq;x;y2Rd,suchthatxisclosertoqthanisy;thatis,kqxkkqyk.NowsupposethatarandomdirectionUischosenfromtheunitsphereSd1,andthatthepointsareprojectedontothisdirection.Whatistheprobabilitythatyfallsbetweenqandxonthisline?Thefollowinglemmaanswersthisquestionexactly.Anapproximatesolution,withdierentproofmethod,wasgivenearlierbyKleinberg(1997).Lemma1Pickanyq;x;y2Rdwithkqxkkqyk.PickarandomunitdirectionU.Theprobability,overU,thatyUfalls(strictly)betweenqUandxUis1 arcsin0@kqxk kqyks 1(qx)(yx) kqxkkyxk21A:ProofWemayassumeUisdrawnfromN(0;Id),thed-dimensionalGaussianwithmeanzeroandunitcovariance.ThisgivestherightdistributionifwescaleUtounitlength,butwecanskipthislaststepsinceithasnoeectonthequestionathand.Wecanalsoassume,withoutlossofgenerality,thatqliesattheoriginandthatxliesalongthe(positive)x1-axis:thatis,q=0andx=kxke1.ItwillthenbehelpfultosplitthedirectionUintotwopieces,itscomponentU1inthex1-direction,andtheremainingd1coordinatesUR.Likewise,wewillwritey=(y1;yR).IfyR=0thenx,y,andqarecollinear,andtheprojectionofycannotpossiblyfallbetweenthoseofxandq.HenceforthassumeyR6=0.LetEdenotetheeventofinterest:EyUfallsbetweenqU(thatis,0)andxU(thatis,kxkU1)yRURfallsbetweeny1U1and(kxky1)U1Theintervalofinterestiseither(y1jU1j;(kxky1)jU1j),ifU10,or((kxky1)jU1j;y1jU1j),ifU10.NowyRURisindependentofU1andisdistributedasN(0;kyRk2),whichissymmetricandthusassignsthesameprobabilitymasstothetwointervals.ThereforePrU(E)=PrU1PrUR(y1jU1jyRUR(kxky1)jU1j):LetZandZ0beindependentstandardnormalsN(0;1).SinceU1isdistributedasZandyRURisdistributedaskyRkZ0,PrU(E)=Pr(y1jZjkyRkZ0(kxky1)jZj)=PrZ0 jZj2y1 kyRk;kxky1 kyRk:6 DasguptaSinhaInthetreedatastructuresweanalyze,mostcellscontainonlyasubsetofthedatafx1;:::;xng.Foracellthatcontainsmofthesepoints,theappropriatevariantofism(q;fx1;:::;xng)=1 mmXi=2kqx(1)k kqx(i)k:Corollary4Pickanypointsq;x1;:::;xnandletSdenoteanysubsetofthexithatincludesx(1).IfqandthepointsinSareprojectedtoadirectionUchosenatrandomfromtheunitsphere,thenforany01,theprobability(overU)thatatleastanfractionoftheprojectedSfallsbetweenqandx(1)isupper-boundedby(1=2)jSj(q;fx1;:::;xng).ProofApplyTheorem3toS,notingthatthecorrespondingvalueofismaximizedwhenSconsistsofthepointsclosesttoq;andthenapplyMarkov'sinequality. 2.3.ExtensiontoknearestneighborsIfweareinterestedinndingtheknearestneighbors,asuitablegeneralizationofmisk;m(q;fx1;:::;xng)=1 mmXi=k+1(kqx(1)k++kqx(k)k)=k kqx(i)k:Theorem5Pickanypointsq;x1;:::;xnandletSdenoteasubsetofthexithatincludesx(1);:::;x(k).SupposeqandthepointsinSareprojectedtoarandomunitdirectionU.Then,forany(k1)=jSj1,theprobability(overU)thatintheprojection,thereissome1jkforwhichmpointsliebetweenx(j)andqisatmostk 2((k1)=jSj)k;jSj(q;fx1;:::;xng):Thistheorem,andmanyoftheothersthatfollow,areprovedintheappendix.3.RandomizedpartitiontreesWe'llnowseethatthefailureprobabilityoftherandomprojectiontreeisproportionaltoln(1=),whilethatofthetwospilltreesisproportionalto.Westartwiththesecondresult,sinceitisthemorestraightforwardofthetwo.3.1.RandomizedspilltreesInarandomizedspilltree,eachcellissplitalongadirectionchosenuniformlyatrandomfromtheunitsphere.Twokindsofsplitsaresimultaneouslyconsidered:(1)asplitatthemedian(alongtherandomdirection),and(2)anoverlappingsplitwithonepartcontainingthebottom1=2+fractionofthecell'spoints,andtheotherpartcontainingthetop1=2+fraction,where01=2(recallFigure3).8 RandomizedtreesforNNsearch3.3.Couldcoordinatedirectionsbeused?Thetreedatastructureswehavestudiedmakecrucialuseofrandomprojectionforsplittingcells.Itwouldnotsucetousecoordinatedirections,asink-dtrees.Toseethis,considerasimpleexample.Letq,thequerypoint,betheorigin,andsupposethedatapointsx1;:::;xn2Rdarechosenasfollows:x1istheall-onesvector.Eachxi;i1,ischosenbypickingacoordinateatrandom,settingitsvaluetoM,andthensettingallremainingcoordinatestouniform-randomnumbersintherange(0;1).HereMissomeverylargeconstant.ForlargeenoughM,thenearestneighborofqisx1.BylettingMgrowfurther,wecanlet(q;fx1;:::;xng)getarbitrarilyclosetozero,whichmeansthattherandomprojectionmethodswillworkwell.However,anycoordinateprojectionwillcreateadisastrouslylargeseparationbetweenqandx1:onaverage,a(11=d)fractionofthedatapointswillfallbetweenthem.4.BoundingTheexactnearestneighborschemesweanalyzehaveerrorprobabilitiesrelatedto,whichliesintherange[0;1].Theworstcaseiswhenallpointsareequidistant,inwhichcaseisexactly1,butthisisapathologicalsituation.Isitpossibletoboundundersimpleassumptionsonthedata?Inthissectionwestudytwosuchassumptions.Ineachcase,querypointsarearbitrary,butthedataareassumedtohavebeendrawni.i.d.fromanunderlyingdistribution.4.1.DatadrawnfromadoublingmeasureSupposethedatapointsaredrawnfromadistributiononRdwhichisadoublingmeasure:thatis,thereexistaconstantC0andasubsetXRdsuchthat(B(x;2r))C(B(x;r))forallx2Xandallr0:HereB(x;r)istheclosedEuclideanballofradiusrcenteredatx.Tounderstandthiscondition,itishelpfultoalsolookatanalternativeformulationthatisessentiallyequivalent:thereexistaconstantdo0andasubsetXRdsuchthatforallx2X,allr0,andall1,wehave(B(x;r))do(B(x;r)).Inotherwords,theprobabilitymassofaballgrowspolynomiallyintheradius.Comparingthistothestandardformulaforthevolumeofaball,weseethatthedegreeofthispolynomial,do(=log2C),canreasonablybethoughtofasthe\dimension"ofmeasure.Theorem8SupposeiscontinuousonRdandisadoublingmeasureofdimensiondo2.Pickanyq2Xanddrawx1;:::;xn.Pickany01=2.Withprobability1311 RandomizedtreesforNNsearchTheoveralldistributionisthusamixture=w11++wttwhosejthcomponentisaBernoulliproductdistributionj=B(p(j)1)B(p(j)N).HereB(p)isashorthandforthedistributiononf0;1gwithexpectedvaluep.Itwillsimplifythingstoassumethat0p(j)i1=2;thisisnotahugeassumptionif,say,stopwordshavebeenremoved.Forthepurposesofbounding,weareinterestedinthedistributionofdH(q;X),whereXischosenfromanddHdenotesHammingdistance.Thisisasumofsmallindependentquantities,anditiscustomarytoapproximatesuchsumsbyaPoissondistribution.Inthecurrentcontext,however,thisapproximationisratherpoor,andweinsteadusecountingargumentstodirectlyboundhowrapidlythedistributiongrows.Theresultsstandinstarkcontrasttothoseweobtainedfordoublingmeasures,andrevealthistobeasubstantiallymoredicultsettingfornearestneighborsearch.Foradoublingmeasure,theprobabilitymassofaballB(q;r)doubleswheneverrismultipliedbyaconstant.Inourpresentsetting,itdoubleswheneverrisincreasedbyanadditiveconstant:Theorem10Supposethatallp(j)i2(0;1=2).LetLj=Pip(j)idenotetheexpectednumberofwordsinadocumentfromtopicj,andletL=min(L1;:::;Lt).Pickanyqueryq2f0;1gN,anddrawX.Forany`0,Pr(dH(q;X)=`+1) Pr(dH(q;X)=`)L`=2 `+1:Now,xaparticularqueryq2f0;1gN,anddrawx1;:::;xnfromdistribution.Lemma11Thereisanabsoluteconstantcoforwhichthefollowingholds.Pickany01andanyk1,andletvdenotethesmallestintegerforwhichPrX(dH(q;X)v)(8=n)max(k;ln1=).Thenwithprobabilityatleast13overthechoiceofx1;:::;xn,foranymn,k;m(q;fx1;:::;xng)4r v coLlog2(n=m):Theimplicationofthislemmaisthatforanyofthethreetreedatastructures,thefailureprobabilityatasinglelevelisroughlyp v=L.ThismeansthatthetreecanonlybegrowntodepthO(p L=v),andthusthequerytimeisdominatedbyno=n2O(p L=v).Whennislarge,weexpectvtobesmall,andthusthequerytimeimprovesoverexhaustivesearchbyafactorofroughly2p L.AcknowledgmentsWethanktheNationalScienceFoundationforsupportundergrantIIS-1162581.ReferencesN.AilonandB.Chazelle.ThefastJohnson-Lindenstrausstransformandapproximatenearestneighbors.SIAMJournalonComputing,39:302{322,2009.13 RandomizedtreesforNNsearchAppendixB.ProofofTheorem7Consideranyinternalnodeofthetreethatcontainsqaswellasmofthedatapoints,includingx(1).Whatistheprobabilitythatthesplitatthatnodeseparatesqfromx(1)?Toanalyzethis,letFdenotethefractionofthempointsthatfallbetweenqandx(1)alongtherandomly-chosensplitdirection.Sincethesplitpointischosenatrandomfromanintervalofmass1=2,theprobabilitythatitseparatesqfromx(1)isatmostF=(1=2).IntegratingoutF,wegetPr(qisseparatedfromx(1))Z10Pr(F=f)f 1=2df=2Z10Pr(Ff)df2Z10min1;m 2fdf=2Zm=20df+2Z1m=2m 2fdf=mln2e m;wherethesecondinequalityusesCorollary4.Thelemmafollowsbytakingaunionboundoverthepaththatconveysqfromroottoleaf,inwhichthenumberofdatapointsperlevelshrinksgeometrically,byafactorof3=4orbetter.Thesamereasoninggeneralizestoknearestneighbors.Thistime,Fisdenedtobethefractionofthempointsthatliebetweenqandthefurthestofx(1);:::;x(k)alongtherandomsplittingdirection.ThenqisseparatedfromoneoftheseneighborsonlyifthesplitpointliesinanintervalofmassFoneithersideofq,aneventthatoccurswithprobabilityatmost2F=(1=2).UsingTheorem5,Pr(qisseparatedfromsomex(j),1jk)Z10Pr(F=f)2f 1=2df=4Z10Pr(Ff)df4Z10min1;kk;m 2(f(k1)=m)df4Z(kk;m=2)+(k1)=m0df+4Z1(kk;m=2)+(k1)=mkk;m 2(f(k1)=m)df2kk;mln2e kk;m+4(k1) m;andasbefore,wesumthisoveraroot-to-leafpathinthetree.AppendixC.ProofofTheorem8C.1.Thek=1caseWewillconsideracollectionofballsBo;B1;B2;:::centeredatq,withgeometricallyin-creasingradiiro;r1;r2;:::,respectively.Fori1,wewilltakeri=2iro.Thusbythedoublingcondition,(Bi)Ci(Bo),whereC=2do4.15 RandomizedtreesforNNsearchwherethelastinequalitycomesfrom(*).Tolower-bound2`,weagainuse(*)togetC`m=(2n(Bo)),whereupon2`m 2n(Bo)1=log2C=m 2ln(1=)1=log2Candwe'redone.C.2.Thek1caseTheonlybigchangeisinthedenitionofro;itisnowtheradiusforwhich(Bo)=4 nmaxk;ln1 :Thus,whenx1;:::;xnaredrawnindependentlyatrandomfrom,theexpectednumberofthemthatfallinBoisatleast4k,andbyamultiplicativeChernoboundisatleastkwithprobability1.TheballsB1;B2;:::aredenedasbefore,andonceagain,wecanconcludethatwithprobability12,eachBicontainsatmost2nCi(Bo)ofthedatapoints.Anypointx(i)62BoliesinsomeannulusBjnBj1,anditscontributiontothesummationink;mis(kqx(1)k++kqx(k)k)=k kqx(i)k1 2j1:Therelationship(*)andtheremainderoftheargumentareexactlyasbefore.AppendixD.AusefultechnicallemmaLemma12SupposethatforsomeconstantsA;B0anddo1,F(m)AB m1=doforallmno.Pickany01anddene`=log1=(n=no).Then:`Xi=0F(in)Ado 1B no1=doand,ifnoB(A=2)do,`Xi=0F(in)ln2e F(in)Ado 1B no1=do1 1ln1 +ln2e A+1 dolnno B:17 RandomizedtreesforNNsearchTounderstandthisdistribution,westartwithageneralresultaboutsumsofBernoullirandomvariables.Noticethattheresultisexactlycorrectinthesituationwhereallpi=1=2.Lemma13SupposeZ1;:::;ZNareindependent,whereZi2f0;1gisaBernoullirandomvariablewithmean0ai1,anda1a2aN.LetZ=Z1++ZN.Thenforany`0,Pr(Z=`+1) Pr(Z=`)1 `+1NXi=`+1ai 1ai:ProofDeneri=ai=(1ai)2(0;1);thenr1r2rN.Now,forany`0,Pr(Z=`)=Xfi1;:::;i`g[N]ai1ai2ai`Yj62fi1;:::;i`g(1aj)=NYi=1(1ai)Xfi1;:::;i`g[N]ai1 1ai1ai2 1ai2ai` 1ai`=NYi=1(1ai)Xfi1;:::;i`g[N]ri1ri2ri`wherethesummationsareoversubsetsfi1;:::;i`gof`distinctelementsof[N].Inthenalline,theproductofthe(1ai)doesnotdependupon`andcanbeignored.Let'sfocusonthesummation;callitS`.WewouldliketocompareittoS`+1.S`+1isthesumofN`+1distinctterms,eachtheproductof`+1ri's.ThesetermsalsoappearinthequantityS`(r1++rN);infact,eachtermofS`+1appearsmultipletimes,`+1timestobeprecise.TheremainingtermsinS`(r1++rN)eachcontain`1uniqueelementsandoneduplicatedelement.Byaccountinginthisway,wegetS`(r1++rN)=(`+1)S`+1+Xfi1;:::;i`g[N]ri1ri2ri`(ri1++ri`)(`+1)S`+1+S`(r1++r`)sincetheri'sarearrangedindecreasingorder.HencePr(Z=`+1) Pr(Z=`)=S`+1 S`1 `+1(r`+1++rN);asclaimed. WenowapplythisresultdirectlytothesumofBernoullivariablesZ=dH(q;X).Lemma14Supposethatp1;:::;pN2(0;1=2).Pickanyqueryq2f0;1gN,anddrawXfromdistribution=B(p1)B(pN).Thenforany`0,Pr(dH(q;X)=`+1) Pr(dH(q;X)=`)L`=2 `+1;whereL=PipiistheexpectednumberofwordsinX.19 RandomizedtreesforNNsearchSupposethatforsomeik,pointx(i)isatHammingdistance`fromq,thatis,x(i)2S`.Then(kqx(1)k++kqx(k)k)=k kqx(i)kr v `sinceEuclideandistanceisthesquarerootofHammingdistance.Inboundingk;m,weneedtogaugetherangeofHammingdistancesspannedbyx(k+1);:::;x(m).Thegeometricgrowthrateofpart(c)impliesthatmostpointslieatHammingdistancecoLorgreaterfromq.ItalsomeansthatdH(q;x(m))coLlog2(n=m).Thus,k;m(q;fx1;:::;xng)=1 mXik(kqx(1)k++kqx(k)k)=k kqx(i)k1 mX`vjS`\fx(1);:::;x(m)gjr v `4r v coLlog2(n=m)wherethelaststepfollowsbylower-boundingjS`jbyanincreasinggeometricseries.21
© 2021 docslides.com Inc.
All rights reserved.