/
Efficient Evaluation of Probabilistic Advanced Spatial Efficient Evaluation of Probabilistic Advanced Spatial

Efficient Evaluation of Probabilistic Advanced Spatial - PDF document

tatyana-admore
tatyana-admore . @tatyana-admore
Follow
420 views
Uploaded On 2015-06-14

Efficient Evaluation of Probabilistic Advanced Spatial - PPT Presentation

Thegoalofa thresholding probabilisticspatialqueryistoretrievetheobjectsthatqualifythe spatial predicates with probability that exceeds a threshold Accordingly a ranking probabilistic spatial query selects the objects with thehighestprobabilitiestoqu ID: 85926

Thegoalofa thresholding probabilisticspatialqueryistoretrievetheobjectsthatqualifythe spatial

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Efficient Evaluation of Probabilistic Ad..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

EfficientEvaluationofProbabilisticAdvancedSpatialQueriesonExistentiallyUncertainDataManLungYiu,NikosMamoulis,XiangyuanDai,YufeiTao,andMichailVaitis—Westudytheproblemofansweringspatialqueriesindatabaseswhereobjectsexistwithsomeuncertaintyandtheyareassociatedwithanexistentialprobability.Thegoalofathresholdingprobabilisticspatialqueryistoretrievetheobjectsthatqualifythespatialpredicateswithprobabilitythatexceedsathreshold.Accordingly,aprobabilisticspatialqueryselectstheobjectswiththehighestprobabilitiestoqualifythespatialpredicates.Weproposeadaptationsofspatialaccessmethodsandsearchalgorithmsfor Ç ONVENTIONALspatialdatabasesmanageobjectslocatedonathematicmapwith100percentcertainty.Inreal-lifecases,however,theremaybeuncertaintyabouttheexistenceofspatialobjectsorevents.Asanexample,considerasatelliteimage,whereinterestingobjects(e.g.,vessels)havebeenextracted(e.g.,byahumanexpertoranimagesegmentationtool).Duetolowimageresolution M.L.YiuiswiththeDepartmentofComputerScience,AalborgUniversity,SelmaLagerlofsVej300,DK-9220Aalborg,Denmark.E-mail:mly@cs.aau.dk.N.MamoulisandX.DaiarewiththeDepartmentofComputerScience, Authorized licensed use limited to: University of Minnesota. Downloaded on January 14, 2010 at 07:02 from IEEE Xplore. Restrictions apply. Assumingthatthespatialattributesoftheobjectsareindexedby2DR-trees,weproposesearchalgo-rithmsforprobabilisticvariantsofspatialrangequeries,NNsearch,spatialskyline(SS)queries,andreverseNN(RNN)queries.RegardingdifferentvariantsofR-trees,wederiveappropriatelower/upperprobabilisticboundsforeffectivelyreducingthesearchI/Ocost.OursearchalgorithmsforNN,SS,andRNNarecarefullydesignedtohandledisqualifiedentriesinsuchawaythattheirremovalisguaranteednottoinfluencetheprobabilisticboundsofanypotentialresultTherestofthispaperisorganizedasfollows:Section2providesbackgroundonqueryingspatialobjectswithuncertainlocationsandextents.Section3definesexisten-tiallyuncertaindataandquerytypesonthem.InSection4,westudytheevaluationofprobabilisticspatialqueries,whentheyareprimarilyindexedontheirspatialattri-butes,orwhenconsideringexistentialprobabilityasanadditionaldimension.Section5addressesprobabilisticvariantsforinterestingadvancedspatialqueries.Section6isacomprehensiveexperimentalstudyfortheperfor-manceoftheproposedmethods.Section7discussesthecasewheretheexistentialprobabilitiesofobjectsarecorrelated.Finally,Section8concludesthispaperwithadiscussionaboutfuturework.ACKGROUNDANDInthissection,wereviewpopularspatialquerytypesandshowhowtheycanbeprocessedwhenthespatialobjectsareindexedbyR-trees.Inaddition,weproviderelatedworkonmodelingandqueryingspatialobjectsofuncertainlocationand/orextent.2.1SpatialQueryProcessingThemostpopularspatialaccessmethodistheR-tree[8],whichindexesminimumboundingrectangles(MBRs)ofobjects.R-treescanefficientlyprocessmainspatialquerytypes,includingspatialrangequeries,NNqueries,andspatialjoins.Fig.1showsacollectionspatialobjects(e.g.,points)andanR-treestructurethatindexesthem.Givenaspatialregionspatialrangequeryretrievesfromtheobjectsthatintersect.Forinstance,considerarangequerythatasksforallobjectswithindistance3from,correspondingtotheshadedareainFig.1.Startingfromtherootofthetree,thequeryisprocessedbyrecursivelyfollowingentries,havingMBRsthatintersectthequeryregion.AnNNquerytakesasinputaqueryobjectandreturnstheclosestobjectin.Forinstance,theNNofinFig.1.IfisindexedbyanR-tree,thenthealgorithmin[9]isthemostefficientsolutionforprocessingNNqueries.Apriorityqueue,whichorganizesR-treeentriesbasedonthe(minimum)distanceoftheirMBRsto,isinitializedwiththerootentries.Thetopentryofthequeueisthenretrieved;ifisaleafnodeentry,thecorrespondingobjectisreturnedastheNN(assumingpointobjects).Otherwise,thenodepointedbyisaccessedandallentriesareinsertedto.InordertofindtheNNofinFig.1,BFfirstinsertsto,andtheirdistancesto.Then,thenearestentryretrievedfromandobjectsareinsertedtoThenextnearestentryin,whichistheNNof.InSection4,wewillextendBFforprocessingprobabilisticversionsofNNsearchonexistentiallyuncertaindata.2.2LocationallyUncertainSpatialDataRecently,thereisanincreasinginterestonthemodeling,indexing,andqueryingofobjectswithuncertainlocationand/orextent.Forinstance,consideracollectionofmovingobjects,whosepositionsaretrackedbyGPSdevices.ExactlocationsareunknownduetoGPSerrorsandtransmissiondelays;e.g.,iftheobjectisinmotion,itslocationmightbeoutdatedwhenreachingthelisteningserver.Asaresult,thesetofpossiblelocationsofanobjectiscapturedbyaprobabilitydensityfunction(PDF),whichcombinesGPSmeasurementerror,thelastreportedobjectlocation,andobjectvelocity[2].Fig.2aexemplifiesalocationallyuncertainobject,modeledbya2DGaussianPDF,withtheregionsofhigherprobabilitymarkedindarkercolor.Accordingto[10]and[7],anarbitraryPDFcanbeapproximatedbyaspatialhistogram(e.g.,33binsin YIUETAL.:EFFICIENTEVALUATIONOFPROBABILISTICADVANCEDSPATIALQUERIESONEXISTENTIALLYUNCERTAINDATA Fig.1.SpatialqueriesonR-trees. Fig.2.Locationallyandexistentiallyuncertaintyobjects.(a)LocationallyuncertainPDF.(b)PCRof,at.(c)NNsearch.(d)Existentiallyuncertainobject. Authorized licensed use limited to: University of Minnesota. Downloaded on January 14, 2010 at 07:02 from IEEE Xplore. Restrictions apply. Fig.2a),whereeachbinstorestheprobabilitytoincludetheobject,andtheirsumequalsto1.Givenalocationallyuncertainobjectandaquery(seeFig.2b),theprobabilitythatintersectsaqueryrangeisformallydefinedby:o;W,wheredenotestheprobabilitycoincideswithpoint.Probabilisticthresholdrangequeries[10],[7]retrieveresultpairso;Po;Wo;W,whereisauser-specifiedthreshold.Thefilter-refinementframeworkisadoptedtoacceleratetheirevaluation.Aninexpensivefilterstepisappliedtodeterminefastwhetheranobjectcanbelongtotheresult.Onlywhenmaypotentiallybecomearesult,thestepisexecutedtocomputetheo;WInthestate-of-the-artmethodin[7],probabilisticconstrained(PCR)isusedforthefilterstepofthequeries.Givenasystemparameter,modelingaminimumvalue,thePCRofa2Dobjectisprecomputedbyslidingeachaxis-parallellineinwardsuntilthesweptareaoverthePDFofequalsto.Fig.2billustratesthePCRofanobject,forappearsintheregionontheleftoflinewithprobability0.2.Similarly,appearsinregionsontheright/bottom/topoflines,respectively,withprobability0.2.Toanswerthethresholdrangequery),wefirstcomparewiththelines.SincedoesnotintersectthePCRof(i.e.,itisaboveline),wecanimmediatelyinferthat.Thus,isdiscardedduringthefilterstepofquery,savingtheexpensivecomputationoftheexactprobabilityTable1summarizesthefundamentaldifferencesbetweenlocationallyuncertainobjectsandexistentiallyuncertainobjects.AsdepictedinFig.2d,anexistentiallyuncertainhasacertainlocation(i.e.,apoint)butitsexistenceisassociatedwithaprobability.Theprobabilityofsatisfyingarangequery;or0otherwise.Thus,canbecomputedinconstanttime.OnemayarguethatanexistentiallyuncertainpointwithexistenceprobabilitycouldbemodeledasalocationallyuncertainobjectwiththePDFconsistingofexactlytwolocations:onepointwithprobability,andapointatinfinitywithprobability.Thismodelencumberstheapplicationofexistinglocationallyuncertaintechniques[10],[7],becausetheyassumemultiplelocationswithprobabilitiesandthecontinuityofPDFinthespace.Consider,forinstance,theprobabilisticNNsearchalgo-rithmforlocationallyuncertaindata,proposedin[6].Givenaquerypointandasetoflocationallyuncertainobjects,wecanderiveq;o,i.e.,theminimumfurthestdistanceofany.Forinstance,inFig.2c,theleadstotheminimum.SincethePDFofto1withinthecircle(centeredatwithradius),itisclearthat,anyobject)withq;ohasnochanceofbeingtheNNof.Foranyremainingobject,and),itsprobabilityofbeingtheNNofdenotedbyo;q.AssumingindependentPDFsbetweendifferentobjects,[6]defineso;qasfollows:o;qq;oq;orepresentthehollowringandtheconcretecircle,respectively,centeredatwithradiusTheevaluationoftheaboveprobabilityisexpensiveforarbitraryPDFsso[6]focusesonbasicPDFsanddevelopsefficientcomputationtechniquesforo;q.NotethattheaboveprobabilisticNNsearchtechniqueisinapplic-abletoexistentiallyuncertaindata.Fig.2ddepictsasetofexistentiallyuncertainobjects,withasimilarspatialconfigurationasinFig.2c.Inthiscase,isstilltheobjectcausingtheminimumvalue.However,sinceitsexistenceprobabilityisnot1,itcannotbeusedtoboundthesearchspace.Forinstance,theobjectnowhasnonzeroprobabilityofbeingtheNNof;thishappenswiththeprobabilityexistsbut,anddonotexist.Otherworkonlocationallyuncertaindataincludesindexingthetrajectoryofanobjectasacylindricalvolumearoundthetrackedpolyline(e.g.,byaGPS),capturinguncertaintyuptoacertaindistancefromthepolyline[11].Asimilarapproachisfollowedin[3],whererecordedtrajectoriesareconvertedtosequencesoflocationscon-nectedbyellipticalvolumes.YuandMehrotra[5]alsomodeltheuncertainlocationsofspatialobjectsby(circular)uncertaintyregionsanddiscusshowtoprocesssimpleandaggregatespatialrangequeriesusingthefuzzyrepresenta-tions.Nietal.[4]studytheevaluationofspatialjoinsbetweentwosetsofobjects,forthecasewheretheobjectextentsare“floating”accordingtouncertaintydistancebounds.AnextensionoftheR-treethatcapturesuncertaintyindirectorynodeentriesisproposed,andR-treejointechniquesareadaptedtoprocessthejoinefficiently.Chengetal.[12],[10]studyaproblemrelatedtoprobabil-isticspatialrangequeries.Theuncertaindataarenotspatial,butordinal1Dvalues(e.g.,temperaturevaluesrecordedfromsensors).Chengetal.[10]indexessuchuncertaindataforefficientevaluationofprobabilisticrangequeries.Chengetal.[12]classifiesqueriesonsuchdatatoqueriesaskingforthesetofobjectssatisfyingaquerypredicateandqueriesaskingforaPDFdescrib-ingthedistributionofaqueryresultwhenitisasingleaggregatevalue(e.g.,thesumofvalues,themaximumvalue,etc.).Finally,LazaridisandMehrotra[13]studytheevaluationofqueriesoveruncertainorsummarizeddata,wheretheuserspecifiesthresholds(precision,recall,laxity)regardingthequality(i.e.,accuracy)ofthedesiredresult.110IEEETRANSACTIONSONKNOWLEDGEANDDATAENGINEERING,VOL.21,NO.1,JANUARY2009TABLE1FundamentalDifferencesbetweenTwoNotionsofUncertainty Authorized licensed use limited to: University of Minnesota. Downloaded on January 14, 2010 at 07:02 from IEEE Xplore. Restrictions apply. Anobjectuncertainifitsexistenceisdescribedbyaprobability.Werefertoexistentialprobability.Notethatsincewecanhave,we(trivially)regarda100percentknownasexistentiallyuncertain.Thisallowsustomodelobjectcollectionswhicharemixturesofuncertainandcertaindata.Ontheotherhand,correspondstoanthatdefinitelydoesnotexist,sothereisnoneedtostoreitinadatabase.Wetaketheexistentialindependencethattheconfidencevaluesoftwodifferentobjectsareindependentofeachother.ThisassumptionisreasonablefortheapplicationsmentionedinSection1(e.g.,satelliteimageextractionandemergencycall).WewillrelaxthisassumptioninSection7andhandleexistentiallyuncertainobjectswhoseconfidencevaluesarecorrelated.Fig.3showsacollectionofexisten-tiallyuncertainpoints.Nexttoeachpointlabel,isitsexistentialprobabilityenclosedinparentheses(e.g.,).Weareinterestedinansweringspatialqueriesthattakeuncertaintyintoaccount.Letbeacollectionofexistentiallyuncertainobjects.Wethendefineprobabilisticversionsofbasicspatialquerytypes:Definition1.probabilisticspatialrangequerytakesasinputaspatialregionandreturnsallx;Ppairs,suchwithprobabilityDefinition2.probabilisticNNquerytakesasinputanobjectandreturnsallx;Ppairs,suchthatistheNN,withprobabilityq;xq;xq;xdenotesthedistancebetweenTheoutputofaprobabilisticqueryisaconventionalqueryresultcoupledwithapositiveprobabilitythattheitemsatisfiesthequery.Thecaseofprobabilisticrangequeriesissimple:foreachobjectthatqualifiesthespatialpredicate.Consider,forinstance,theshaded,showninFig.3.Twoobjects,withconfidences,respectively.Similartolocationallyuncertaindata,theprobabilityofantoqualifyaspatialrangequeryisirrelevantofthelocationsandconfidencesofotherobjects.Ontheotherhand,theprobabilityofanobjecttobetheNNdependsonthelocationsandprobabilitiesofotherobjects.ConsideragainFig.3andassumethatwewanttofindthepotentialNNof.Thenearestpointto)istheactualNNiffexists.Thus,isaqueryresult.InorderforthesecondnearestpointbetheNNofexistand2)mustexist.isanotherresult.Bycontinuingthisway,wecanexplorethewholesetofpointsinassignaprobabilitytoeachofthemtobetheNNofThisNNqueryexamplenotonlyshowsthesearchcomplexityinuncertaindatabutalsounveilsthattheresultofprobabilisticqueriesmaybearbitrarilylarge.Forinstance,theresultofanyNNqueryisaslargeas,ifforall.Wecandefinepracticalversionsofprobabilisticquerieswithcontrolledoutputbyeithertheresultsoflowprobabilitytooccurorthemandselectingthemostprobableones:Definition3.x;Pbeanoutputitemofaprobabilisticspatialquery.Theversionoftakesasadditionalinputathresholdandreturnstheresultsforwhich.TheversionoftakesasadditionalinputapositiveintegerandreturnsthewiththehighestForexample,athresholdingrange(window)queryontheobjectsofFig.3returns,whereasarankingrangequeryVALUATIONOFLikespatialqueriesonexactdata,probabilisticspatialqueriescanbeefficientlyprocessedwiththeuseofappropriateaccessmethods.Inthissection,weexplorealternativeindexingschemesandproposealgorithmsforprobabilisticqueriesonthem.Wefocusonthemostimportantspatialquerytypes;namely,rangequeriesandNNqueries.4.1Algorithmsfor2DR-TreesThemoststraightforwardwaytoindexasetexistentiallyuncertainspatialdataistocreatea2DR-treeontheirspatialattribute.Theconfidencesofthespatialobjectsarestoredtogetherwiththeirgeometricrepresenta-tionorapproximation(forcomplexobjects)attheleavesofthetree.Wenowstudytheevaluationofprobabilisticqueriesontopofthisindexingscheme.4.1.1RangeQueriesProbabilisticrangequeriescanbeeasilyprocessedintwosteps;astandarddepth-firstsearchalgorithmisappliedontheR-treetoretrievetheobjectsthatqualifythespatialpredicateofthequery.ForeachretrievedobjectIfthequeryisathresholdingquery,thethresholdusedtofilteroutobjectswithisarankingquery,apriorityqueuemaintainstheresultswiththe,duringsearch,andoutputsthemattheendofqueryprocessing.4.1.2NNSearchNNsearchismorecomplexcomparedtorangequeries,becausetheprobabilityofanobjecttoqualifythequerydependsonthelocationsandconfidencesofotherobjects. YIUETAL.:EFFICIENTEVALUATIONOFPROBABILISTICADVANCEDSPATIALQUERIESONEXISTENTIALLYUNCERTAINDATA 1.Especiallyforthresholdingrangequeriesofverylargethresholdsviablealternativecouldbetouseathatindexesobjectsbasedontheirprobabilitytoefficientlyaccesstheobjectsandthenfilterthemusingthespatialquerypredicate. Fig.3.NNsearchexample. Authorized licensed use limited to: University of Minnesota. Downloaded on January 14, 2010 at 07:02 from IEEE Xplore. Restrictions apply. Algorithm1elegantlyandefficientlycomputestheprob-tobeNNof,forallAlgorithm1.ProbabilisticNNona2DR-treePNN2D(Querypoint,2DR-treeonfirstProb.ofnoobjectbeforefirstandmoreobjectsinnextNNfirstx;PfirstfirstAlgorithmPNN2DappliesBFNN-search[9]ontheR-treetoincrementallyretrievetheNNsof,withoutconsideringconfidences.ItalsoincrementallymaintainsafirstwhichcapturestheprobabilitythatnoobjectretrievedbeforethecurrentobjectistheactualNN.firstisequalto,forallobjectsseenbefore.Thus,theprobabilityoftobetheNNoffirst.IntheexampleofFig.3,PNN2Dgraduallycomputes,etc.Notethatobjectsofinthisexampleareretrievedandinsertedtotheresponseset.Inotherwords,PNN2Ddoesnotterminate,untilanobjectisfound;ifnosuchobjectexists,allobjectshaveapositiveprobabilitytobetheNN.Thresholdingandranking.AsdiscussedinSection3,theusermaywanttorestricttheresponsesetbythreshold-ingorranking.Algorithm2isthethresholdingversionofPNN2D,whichreturnsonlytheobjectsTheonlydifferenceswiththenonthresholdingversionaretheterminationconditionatline2andthefilteringofresultshaving(line5).Assoonasfirstweknowthatthenextobjects,evenwith100percentconfidencecannotbetheNNof,sowecansafelyterminate.Forexample,assumethatwewishtoretrievethepointsinFig.3whicharetheNNofprobabilityatleast.First,isretrieved,whichisfilteredoutatline5andfirstsetto.Then,weretrievefirst(alsodisqualified)andsetfirstisretrievedwith(alsodisqualified)first.Thenextobjectsatisfies,thusisoutput.Then,firstandweretrievelified).Finally,firstandthealgorithmterminateshavingproducedonlyAlgorithm2.ProbabilisticNNona2DR-treewithPTNN2D(Querypoint,2DR-treeonfirstProb.ofnoobjectbeforefirstandmoreobjectsinnextNNfirstx;PfirstfirstPRNN2D(Algorithm3),therankingversionofPNN2D,maintainsaheapobjectswiththelargestfoundsofar.Letbethe;assoonasfirst,weknowthatthenextobjects,evenwith100percentconfidencecannotbetheinthesetofmostprobableNNof,sowecansafelyterminate.Forexample,assumethatwewishtoretrievethepointwiththehighestprobabilityofbeingtheNNofinFig.3.PRNN2Dprogressivelymaintainstheobjectwiththe.Aftereachofthefirstfourobjectaccesses,becomes0.1,0.1,0.162,and0.324.Thealgorithmtermi-natesafterthefourthloop,whenfirst;thisindicatesthatthenextobjectcanatmost,thushasthehighestchancesamongallobjectstobetheNNofAlgorithm3.ProbabilisticNNona2DR-treewithrankingPRNN2D(Querypoint,2DR-treeonfirstProb.ofnoobjectbeforeheapofobjectswithhighestobjectinfirstandmoreobjectsinnextNNfirst8:updatetoincludeprobabilityinfirstfirst4.2QueryEvaluationUsingAugmentedR-TreesWecanenhancetheefficiencyoftheprobabilisticsearchalgorithms,byaugmentingsomestatisticalinformationtotheR-treedirectorynodeMBRs.Asimpleandintuitivemethodistostorewitheachdirectorynodeentryavalue;themaximumforallobjectsindexedunderThisvaluecanbeusedtopruneR-treenodes,whileprocessingthresholdingorrankingqueries.Similaraug-mentationtechniquesareproposedin[4]and[10]forlocationallyuncertaindata.Table2summarizestheconditionsforpruningR-treeentries(andthecorrespondingsubtrees)whichdonotpointtoanyresults,duringrangeorNNthresholdingandrankingqueries.Forrangequeries,wecandirectlypruneanentrywhen:1)doesnotintersectthequeryrangeor2)itssatisfiestheconditioninthetable.Ontheotherhand,forNNsearch,adisqualifiedentrycannotbedirectlypruned,becausetheconfidencesofobjectsinthepointedsubtreemaybeneededforcomputingtheprobabilitiesofobjectswithgreaterdistancesto,buthighenoughprobabilitiestobeincludedintheresult.112IEEETRANSACTIONSONKNOWLEDGEANDDATAENGINEERING,VOL.21,NO.1,JANUARY2009TABLE2CheckingDisqualifiedEntriesinAugmented2DR-Trees Authorized licensed use limited to: University of Minnesota. Downloaded on January 14, 2010 at 07:02 from IEEE Xplore. Restrictions apply. Letusassumeforamomentthatforeachnonleafentryweknowtheexactnumberofobjectsinitssubtree.Algorithm4isthethresholdingNNprocedurefortheaugmented2DR-tree.BFisextendedasfollows:Ifanonleafisdeheapedforwhichfirst,thenodepointsisnotimmediatelyloaded(asinPTNN2D)isinsertedintoasetentries.ForobjectsretrievedlaterfromtheBFheap,weuseentriesin;lowerandupperboundsfor.If,weknowthatisdefinitelyaresult.Ifweknowthatisdefinitelynotaresult.Ontheotherhand,(lines6-12),wemustrefinetheprob-abilityrangefor.Forthispurpose,wepicktheentrywiththeminimumq;eObservethatanyentrieswithq;eq;xcannotcontributetotheprobabilityof.As(atline6),theentryselectedatline7mustsatisfyq;eq;x.Ifisanobject,thenmustbenearertoandweupdatefirstwiththeconfidenceof.Otherwise,itsconfidencedoesnotaffectfirst,weaccessitschildnodeandinsertallentriesof.Ineithercase,theprobabilityrangeshrinks.TheprocessisrepeatedwhiletherangeAlgorithm4.ProbabilisticNNonanaugmented2DR-treewiththresholdingPTNN2Daug(Querypoint,Augmented2DR-treeon,ThresholdfirstProb.ofnoobjectbeforelistofdisqualifiedentriesfirstandmoreobjectsinnextNNduringBF-search,eachnonleafentrywithfirstisremovedfromBFheapandinserted5:computebyusingfirst7:picktheentrywiththesmallestq;e;removeisanobjectisanobjectcloserfirstfirst11:readnodepointedbyandinsertallentriesof12:computebyusingfirstx;PfirstfirstItremainstoclarifyhowforanobjectarecomputed.Notethatonlycontainsentrieswhoseminimumdistancetoaresmallerthanq;x.Foraninthelist,theconfidenceofeachobjectinitssubtreeisintherange.Inaddition,thereexistsatleastoneobjectinwhoseconfidenceisexactlycorrespondstothecasewhereforallobjectsunderallentriesinareclosertoisandtheyallhavethemaximumpossibleconfidences.tothecase,whereforall,withmaximumdistancegreaterthanq;x,thereisonlyoneobjectwithconfidence(forallotherobjectsunderconfidenceconvergesto0):firstq;eq;xfirstq;eq;xSofar,wehaveassumedthat,foreachnonleafentrythenumberofobjectsinitssubtreeisknown(e.g.,thisinformationisaugmented,orthetreeispacked).Wecanstillapplythealgorithmforthecasewherethisinformationisnotknown,byusinganupperboundforistheleveloftheentry(leavesareatlevel0)isthemaximumR-treenodefanout.Thisupperboundreplacesin(1).LetusnowshowthefunctionalityofthePTNN2Daugalgorithmbyanexample.ConsidertheaugmentedR-treeofFig.4thatindexesthepointsetofFig.3andassumethatwewanttofindthepointsthataretheNNofwithprobabilityatleast.First,theentriesintherootareenheapedintheBFheap.Next,theentryisdequeued.Sinceitdisqualifiesthequeryfirst,itisinsertedintothelist.Then,theentryisdequeued.ItsobjectsareenheapedintheBFQueue.Thenearestobjectdequeued.From(1)and(2),wederiveaprobabilityrangebyusingfirstisdisqualifiedas.Then,firstandweretrieve.Sinceisaresult.Next,firstandthenextentryretrievedfromthepriorityqueueoftheBFalgorithmis.Wedonotaccessthenodepointedby,sinceweknowthatforeachindexedunderfirstisinsertedinto.Next,isdequeuedanddiscardedasNow,theBFheapbecomesemptyandthealgorithmterminates.NotethatPTNN2Daccessesallnodesofthetreeinthisexample,whereasPTNN2DaugsavestwoleafnodeAlgorithm5.ProbabilisticNNonanaugmented2DR-treewithrankingPRNN2Daug(Querypoint,Augmented2DR-treeon,Integer YIUETAL.:EFFICIENTEVALUATIONOFPROBABILISTICADVANCEDSPATIALQUERIESONEXISTENTIALLYUNCERTAINDATA 2.Throughoutthispaper,weuseq;xtodenotethedistancebetweentwopoints;anduseq;eq;etodenotetheminimum(maximum)possibledistancebetweenandanydatapointindexedbythesubtreepointedby Fig.4.Exampleofaugmented2DR-tree. Authorized licensed use limited to: University of Minnesota. Downloaded on January 14, 2010 at 07:02 from IEEE Xplore. Restrictions apply. firstProb.ofnoobjectbeforelistofdisqualifiedentriesheapofobjects,organizedbyobjectinfirstandmoreobjectsinnextNNduringBF-search,eachnonleafentrywithfirstisremovedfromBFheapandinserted7:computebyusingfirst9:picktheentrywiththesmallestq;e;removeisanobjectisanobjectcloserfirstfirstforallsuchthatq;eq;ofirstfirst15:readnodepointedbyandinsertallentriesof16:computebyusingfirstx;Pfirstfirstischangedorischanged20:recompute,foreachfirst22:removeentriesfirstfirst25:applylines9-16;26:applylines20-22;27:removeq;eq;oNNretrievalontheaugmentedR-treeisperformedbyAlgorithm5.PRNN2Daughasseveraldiffer-encesfromthethresholdingNNalgorithm.Aheapemployedtoorganizeobjectsbytheirdenotestheintheheap.Observethatmorecomplicatedtechniquesareusedforupdating,astheaccessestoaffecttheorderofobjectsin.Eachobjectfirst,whichisthevalueoffirstisenheaped(line18).Atlines12-13,first(forsomeentriesin)isupdatedforeachobjectfoundnofurtherthan.Thefirstvalueisusedtoupdateandpotentiallytheorderofobjectsinatlines20-21.Notethatmaystoremorethanentries,sincetheremaybeobjectsinit.However,entriesarere-movedfrom.ThealgorithmdoesnotneedtoaccessanymoreobjectsfromtheBFheapassoonasfirst.Incasehasmorethanobjectsatthatpoint,weneedtorefinetheprobabilityrangesoftheobjectsinprocessingentriesin)untilwehavethebestobjects.Inthiscase,entriesareremovedfromq;eq;obecausesuchentriescannotbeusedtorefinetheprobabilityrangesoftheobjectsin4.3QueryEvaluationUsing3DR-TreesAnalternativemethodforindexingexistentiallyuncertaindataistomodeltheconfidencesofobjectsasanadditionaldimensionandusea3DR-treetoindextheobjects.Now,eachnonleafentryinthetree,apartfromthespatialdimensions,hasarangewithinwhichtheexistentialprobabilitiesofallobjectsinitssubtreefall.Fig.5illustratesthedifferencesbetweentheaugmen-ted2DR-treeandthe3DR-tree.Fig.5adepictsthestructureoftheaugmented2DR-treeforthepoints.Theinsertionalgorithm[14]aimsatgroupingthepointsintoleafnodessuchthattheirMBRareasareminimized.Assuch,the(nonleaf)entrypointstoaleafnodecontainingthepoints,andwhereastheentrypointstoaleafnodecontainingthepoints,and.Thespatialranges,andtheaugmentedprobability,forthesetwoentriesintheaugmented2DR-treearelistedinFig.5c.Notethateachentryconsistsofsixvalues(includingitschildnodepointer).Fig.5bshowsthestructureofthe3DR-tree,forthesamesetofpoints.Theinsertionoptimizestheboundingrectanglesofnodesdefinedbythreedimensions:spatialdimensions,aswellastheprobabilitydimension.Hence,theentrypointstoaleafnodecontainingthepoints,and;whereastheentrypointstoaleafnodecontainingthepoints,and.Thevaluesstoredintheseentriesinthe3DR-treearealsolistedinFig.5c.Now,eachentryconsistsofsevenvalues(includingitschildnodepointer),implyingthatthefanoutofthe3DR-treeisslightlysmallerthantheaugmented2DR-tree.ThemethodsforprocessingtheprobabilisticrangeandNNqueriesovertheaugmented2DR-tree(inSection4.2)areapplicableforthe3DR-tree,sinceeachtreeentrystillstoresanvalue.Inparticular,fortheNNquery,wetoderivetighterprobabilityranges:firstq;eq;x114IEEETRANSACTIONSONKNOWLEDGEANDDATAENGINEERING,VOL.21,NO.1,JANUARY2009 Fig.5.StructuresofdifferentR-treevariants.(a)Augmented2DR-tree.(b)3DR-tree.(c)Comparisonbetweenthetwotrees. Authorized licensed use limited to: University of Minnesota. Downloaded on January 14, 2010 at 07:02 from IEEE Xplore. Restrictions apply. firstq;eq;xIftheexactnumberofobjectsinthesubtreepointedbyisnotknown,wecanusethefanoutandtheminimumnodeutilization(0.4for)andreplacein(3)andbyin(4).Interestingly,thequeryperformanceofthe3DR-treeisnotnecessarilybetterthantheaugmented2DR-tree.Acarefulexaminationof(3)and(4)revealsthattheseprobabilityboundsaredeterminedbyboththespatialandprobabilisticintervalsoftheentries.Eventhoughthevaluesinthe3DR-treearehelpfulfortighteningthebounds,thiseffectiscounteractedbythelargespatialboundingrectanglesinthetree.Thus,more(disqualified)entriessatisfytheq;eq;xconditionin(3),andfewersatisfytheq;eq;xconditionin(4).Hence,thefinalprobabilityboundsforthe3DR-treemayindeedbecomelooser.Besides,the3DR-treehasaslightlysmallerfanout,whichmayleadtomorepageaccesses.Inthissection,wediscussprobabilisticvariantsofspatialskyline(SS)queries[15]andreversenearestneighbor(RNN)queries[16],duetotheirapplicationsinspatialdecisionsupportsystems.Foreachquerytype,wefirstpresentitsbackground,thendefineitsprobabilisticvariant,andfinallydevelopcorrespondingqueryalgorithmsforthethresholdingandrankingversions.5.1SpatialSkylineQueriesGivenasetofquerypoints(e.g.,userlocations)andtwo(e.g.,twofacilities),spatiallydominatesdominatesp0whenallquerypointsinareclosertothantoQ;dq;pq;pGivenapointdataset,its[15](withrespecttocontainstheobjectsthatarenotspatiallydominatedbyanyotherobjectin.Asanexample,considerthedistancesofthestationsfromagroupoftwousersinFig.6a.TheSScontains,and.ThemainapplicationofSSqueriesistodiscoverfacilitiesthatarenotfartherthanotherfacilities,forToeaseourdiscussion,wefirstintroducesomenotation.TheSSqueryisformulatedinafeaturespaceinwhicheachdimensioncapturesthedistancetoaquerypoint.Givenaofquerypoints,aspatiallocation(i.e.,datapoint)(oranMBR)canbemappedtoapoint(oranMBR)inawherethedimensioncapturesthedistancesofthepoints.Table3illustratesthemappingofadatapointoranMBR(correspondingtoanonleafR-treeentry,assumingthatthedatapointsareindexedbyanR-tree)tothisfeaturespace.Asashorthandnotation,weusetomeanspatiallydominates.LetbethelowerandupperboundcornersoftheMBR,respec-tively(seeFig.6b).Since,eachpointinmustspatiallydominateallpointsin.Ontheotherhand,onlysomepointinmayspatiallydominatesomepointsWiththeabovemappingtechnique,Papadiasetal.[17]proposeanR-tree-basedalgorithmforcomputingthedynamicskylineinthefeaturespace.TheideaistoapplytheBFsearchalgorithm[9]ontheR-treetovisittheentriesfromtheorigininthefeaturespace,inascendingorderofthevalue:q;ePapadiasetal.[17]provedthatapointmustbediscoveredearlierthanthepointsitdominates(ifany).Hence,apointisreportedasaresultifitcannotbedominatedbyanyexaminedpoints.WethenadapttheabovealgorithmfortheprobabilisticSSquery.ProbabilisticSSqueryanditsproperties.Forexistentiallyuncertaindata,apointisaqueryresultwithprobabilitywhichcorrespondstothecasethatexistsandthepointsdominatingdonotexist.AprobabilisticSSquerytakesasinputasetofquerypointsandreturnsallx;Ppairs,suchthatbelongstotheSSofwithprobability.Forinstance,inFig.6a,weSincenopointsdominate,wederiveTheprobabilityofotherpointscanbecomputedinasimilarway.InSection4.1.2,weusedasinglevariablefirstincrementallycomputetheupperboundprobabilityfortheremainingobjectstobeexamined.ThistechniqueisinapplicabletotheSSquery,sincethepointsvisitedindecreasingorderfromtheorigindonotnecessarilyinfluencethepointsthatwillbevisitednext.Forinstance,theexistenceofpointinFig.6adoesnotinfluencetheprobabilitythat(whichisfurtherthanfromtheoriginandwillbevisitednext)isintheskyline.However,it,sinceisdominatedby.Ingeneral,given YIUETAL.:EFFICIENTEVALUATIONOFPROBABILISTICADVANCEDSPATIALQUERIESONEXISTENTIALLYUNCERTAINDATA Fig.6.Featurespacedefinedbythedistancesfromquerypoints.(a)Apointset.(b)Dominancerelationship.TABLE3MappingfromtheOriginalSpacetotheFeatureSpace Authorized licensed use limited to: University of Minnesota. Downloaded on January 14, 2010 at 07:02 from IEEE Xplore. Restrictions apply. asetofalreadyexaminedpoints,inorderoftheirdistancetotheorigin,anupperboundfirstoftheprobabilitythatpointisintheskylinewithrespecttocanbecomputedbyfirstForanMBR,theupperboundprobabilityfirstofanypointintobeintheskylinecanbecomputedasfollows:firstdominatesanypointin.Next,wediscusshowthresholdingandrankingversionsofthequeryareevaluatedona2DR-tree.Thresholdingandranking.Assumethatwewanttofindthepointswithprobabilityatleasttobeintheskyline.Algorithm6describestheproceduretoretrievethesepointsfroma2DR-tree.Atline3,objectsareincrementallyretrievedfromthetreeinincreasingorderoftheirvalue,whichisdefinedin(6).Setisusedforstoringobjectsexaminedsofar,inordertoderivetheprobabilityfirstremainingobjects(using(8)).TheprobabilityderivationoffirstiscorrectbecausePapadiasetal.[17]provedthatallthepointsdominatingmusthavebeenexaminedbefore(andstoredinto).Whenreportedasaresult.Incasefirst,any(remaining)dominatedbymustbeatleastdominatedbythesamesubsetofpointsinsuchthatfirst.Thus,insertedintoonlywhenfirst.Followingtheabovelogic,wecanoptimizethealgorithmatline3byremovingnonleafentrieswithfirstfromtheBFheap.Algorithm6.ProbabilisticSSona2DR-treewithPTSKY2D(Queryset,2DR-treeonsetofexaminedobjectsmoreobjectsinnextpointwithminimumduringBF-search,nonleafentriesfirstareremovedfromBFheapfirstx;Pfirst8:insertThreshold-basedretrieval(ofAlgorithm6)canbeex-tendedtoretrievethepointswiththehighestprobabilitytobeintheskyline(i.e.,therankingprobabilisticvariantofthequery).Thegeneralideaistomaintainaheapwiththehighestfoundsofar.Inaddition,wereplacethefixedthresholdbyafloatingbound,whichindicatesthe.Ifisfoundtobegreaterthan,thentheresultheapandtheboundareupdated.Asincreases,(unnecessary)objectswithfirstremovedfrominordertosavespace.ExtensionsforaugmentedR-trees.AsdiscussedinSection4.2,augmentedR-treescanbeusedtoimprovethequeryefficiency.Algorithm7generalizesAlgorithm4toutilizeinformationfromanaugmented2DR-tree.DuringtheBF-searchatline4,eachnonleafentryfirstisremovedfromBFheapbecausetheycannotcontainanyresults.Iftheremovedentryhasfirstatleast,thenitmayinfluencetheremainingpointsandisinsertedintothelistforfurtherprocessing.Atline5,thelowerboundandupperprobabilitiesofapointarecomputedfromusing(10)and(11),respectively:firstfirst,weneedtorefinetheprobabilityrangefor(lines6-13).Afterthat,isreportedasaresultif.Incasefirstisinsertedintobecauseitinfluencestheprobabilityofotherpointsthatmayendupintheresult.Algorithm7.ProbabilisticSSonanaugmented2DR-treewiththresholdingPTSKY2Daug(Querypoint,Augmented2DR-treeon,ThresholdsetofexaminedobjectslistofdisqualifiedentriesmoreobjectsinnextpointwithminimumduringBF-search,eachnonleafentryfirstisremovedfromBFheap,andinsertedintofirst5:computebyusingfirst7:picktheentrywiththesmallestfirstmayinfluencetheprobabilityofpotentialresultsisanobject10:insert12:readnodepointedbyandinsertallentriesof13:computebyusingfirstx;Pfirst17:insertSimilarly,wecangeneralizethealgorithmforevaluatingrankingSSqueriesonanaugmented2DR-tree.Regarding3DR-trees,(12)and(13)areappliedtocomputethevaluesofapoint,respectively.Incasetheexactofobjectsinthesubtreepointedbyisnotknown,wecanusethefanoutandtheminimumnodeutilization(0.4)andreplacein(10)and(12),andbyin(13):116IEEETRANSACTIONSONKNOWLEDGEANDDATAENGINEERING,VOL.21,NO.1,JANUARY2009 Authorized licensed use limited to: University of Minnesota. Downloaded on January 14, 2010 at 07:02 from IEEE Xplore. Restrictions apply. firstfirst5.2ReverseNearestNeighborQueriesGivenapointdatasetandaquerypoint,anRNNquery[16]retrievestheobjectsastheirNN.Thisqueryhasapplicationsindecisionsupportandresourceallocation.Stanoietal.[16]andTaoetal.[18]developR-tree-basedalgorithmsforRNNqueries.Inthissection,weextendthegeometricpartitioningmethodin[16]tosolveprobabilisticversionsofthisproblem.Accordingto[16],anRNNquerycanbeansweredintwosteps.Inthefilterstep,the2Ddataspace(showninFig.7a)isdividedintosixequalsectorsaroundthequerypoint.TheNNofineachsector(ifany)isincludedintothecandidateset.Intheexample,thecandidatesarethepoints),and).Stanoietal.[16]provedthatthecandidatesetisasupersetoftheresultset.Duringtherefinementstep,eachcandidateisverifiedbyretrievingitsNN.Acandidate(e.g.,)isreportedasaresultifitsNNisOtherwise,thecandidate(e.g.,)isafalsehitanditisProbabilisticRNNqueryanditsproperties.Forexistentiallyuncertaindata,apointbelongstotheRNNsetofq;xwhichcorrespondstothecasethatexistsandthepointsthatareclosertothantodonotexist.Forinstance,inFig.7a,wehaveisclosertothanto.Sinceisclosertothantootherpoints,wederiveTheprobabilityofotherpointscanbecomputedinasimilarway.SimilarlytotheskylinequeryandunliketheNNqueryofSection4.1.2,wecannotdefineanorderofvisitingthepointsaround,suchthattheupperboundprobabilityofremainingpointstobeintheRNNresult,canbemaintainedbyincrementallyupdatingasinglefirstvalue.Toelaboratethis,supposethatwefirstexaminedthepointinFig.7a.Notethatonlyinfluencestheprobabilitiesofnotthatof.Theexamplealsodemonstratesthattheupperboundprobabilityofapointcanbecomputedbyusingexaminedpoints.Givenasetof(examined)points,theupperboundprobabilityfirstofapointwithrespectisdefinedasfirstq;xGeometricpropertiesofRNNscanbeexploitedtoderivetheupperboundprobabilityofremainingpointsinaspecificsector(seeFig.7a).Stanoietal.[16]provedthat,iftwopointsareinthesamesectorandisclosertothanto,thenmustbeclosertothanto.Basedonthisproperty,anaturalsolutionforthequeryistoretrievethepointsinascendingorderoftheirdistancesfrom.Foreach,itsfirstvalueisusedastheupperboundprobabilityofanyremainingpointinfirstissetto1initiallyanditismultipliedbythefactorwhenanewpointisdiscoveredinWeobservethatintroducingadditionalsectorsmayhelpderivingtighterprobabilityboundsforunexaminedpoints.Considerthe12-sectorpartitioningshowninFig.7b.Whenapoint(say,)isdiscoveredinthesector,itisusedtofirstforthesectors(i.e.,,and)thatarewithin(maximum)60degreesangularrangefromConversely,theprobabilityboundofasectoriscontributedbythepointswithindegreesangularrange.Recallthat,fortheoriginal6-sectorpartitioningin[16],asectorisonlyaffectedbythepointswithin60degreesangularrange.Ingeneral,givenapositiveinteger,inthepartitioningscheme,sectorsneedtobeexaminedpervisitedpoint.Themoresectorswehave,thetighterprobabilityboundsarederivedfor(unexaminedpointsin)thesectors,andtheearlierunqualifiedsectorscanbepruned.Ontheotherhand,thecomputationaloverheadofupdatingprobabilityboundsforthesectorsispropor-tionalto.InSection6,wewilldetermineanappropriatenumberofsectorsthatachievessignificantI/Ocostreductionandaddslittlecomputationaloverheadforupdatingprobabilityboundsforthesectors.Next,wediscusshowthispartitioningschemecanbeusedtoevaluateprobabilisticRNNqueries.Thresholdingandranking.Algorithm8showshowthresh-oldingRNNqueriesareevaluatedona2DR-tree.Thesystemparameterspecifiesthenumberofsectorstobeused.First,thespaceisdividedintoandtheirprobabilityboundsfirstaresetto1.Thealgorithmmaintainscandidateobjects(i.e.,potentialresults)inaanddelayscomputingtheactualprobabilityofacandidateuntilallobjectsinfluencingithavebeenexamined.Examinedobjectsarestoredinthesetandtheyareusedtocomputeupperboundprobabilitiesforcandi-dateobjects.Bothareinitializedtoemptysets.Atline6,weapplyBFsearch[9]toincrementallyretrievethenextNN(i.e.,theobject)offromthetree.Supposedenotesthesectorcontaining.Iftheupperboundfirstisgreaterthanthethreshold,thenatighterupperboundprobabilityfirstiscomputed,byexaminingtheobjectsin(see(15)).Whentheaboveprobabilityisatleast,theobjectisinsertedinto.After YIUETAL.:EFFICIENTEVALUATIONOFPROBABILISTICADVANCEDSPATIALQUERIESONEXISTENTIALLYUNCERTAINDATA Fig.7.RNNqueryexample.(a)Six-sectorpartitioning.(b)Twelve-sector Authorized licensed use limited to: University of Minnesota. Downloaded on January 14, 2010 at 07:02 from IEEE Xplore. Restrictions apply. isinsertedintofirstisupdatedforeachsectorwithin60degreesangularrangefrom.Inturn,usedtoupdatethefirstvalueforobjectsin,andthosefirstareremovedfrom.Ifthelastdeheapeddistance(fromtheBFheap)isgreaterthanq;oforacandidateobject,thenallentriesinBFheapcannotaffecttheprobabilityof.Atline17,wecomputetheactualprobabilityfirstandreportaresultwhen.Theloop(lines5-20)continueswhilesomesectorsmaycontainpotentialresultstobediscoveredorisnotempty.Algorithm8.ProbabilisticRNNona2DR-treewithPTRNN2D(Querypoint,2DR-treeon:numberofsectors(systemparameter)1:dividethespaceintoequalsectorsaroundforallfirstUpperprob.boundofremainingobjectsinsectorfirstnextNNq;x7:letbethesectoroffirstfirstapplycheapfilterfirst,andthenexpensivefilterfilterxg;10:S:¼S[fxg;11:forallsuchthatiswithin(maximum)60degreesangularrangefromfirstfirstforallsuchthatx;oq;ofirstfirstfirst15:removeobjectsfirstfilterfalsehitsforallsuchthatq;oentriesinBFheapcannotaffecttheprobabilityoffirsto;P20:removeAlgorithm9.ProbabilisticRNNonanaugmented2DR-treewiththresholdingPTRNN2Daug(Querypoint,Augmented2DR-treeon,Threshold:numberofsectors(systemparameter)1:dividethespaceintoequalsectorsaroundforallfirstUpperprob.boundofremainingobjectsinsectormoreobjectsinnextNNq;xduringBF-search,eachnonleafentryintersectingonlysector(s)withfirstremovedfromBFheapandinsertedinto7:applylines7-15ofAlgorithm8;forallsuchthatq;oinBFheapcannotaffecttheprobabilityoffirstL;minde;oq;orefinementstep10:removetheentrywiththesmalleste;oisanobject12:applylines11-15ofAlgorithm8,butbyreplacing14:readthenodepointedbyinsertallentriesoffirsto;P18:removeforallverifyremainingcandidatesin20:applylines9-18ofthisalgorithm.TheabovealgorithmcanbeextendedtoretrievetherankedRNNsfromthe2DR-tree.Itmaintainsaobjectswiththehighestfoundsofar.Inaddition,wereplacethefixedthresholdbyafloating,whichindicatesthe.Atlines18-19,ifisgreaterthan,thentheresultheapandtheboundareupdated.Besides,theabovethresholdingalgorithmcanbeadaptedtoAlgorithm9,foraugmented2DR-treesand3DR-trees.Atline6,eachnonleafentryintersectingonlysector(s)withfirstisremovedfromBFheapbecausetheycannotcontainanyresults.However,suchentriesmayaffecttheprobabilityofotherpointssotheyareinsertedinto.Lines9-14computetheactualprobabilityforsuchanobject,byrefiningitsfirstwiththeentriesin.Forthis,wecheckwhethertheupperboundprobabilityfirstisaboveisclosertosomeentriesinthanto.Ifso,theentryclosesttoremovedfromanditschildnodeisaccessed.Incasepointstoatreenode,allitsentriesareinsertedintoOtherwise,entriesinareusedtoupdatethesetfirstvaluesofcandidateobjects,andfirstvaluesofsectors.Atlines19-20,theremainingcandidatesinareverifiedbyaccessingentriesinthatmayinfluencetheirprobabilities.Inthissection,weevaluatetheefficiencyoftheproposedtechniques.Wecomparetheperformancesoffiveindexesandtheircorrespondingalgorithmsforthresholdingandrankingversionsofrangequeries,NNsearch,skylinequeries,andRNNretrieval.Thefiveindexesareasimple2DR-tree(denotedbya2DR-tree,whereeachnonleafentryisaugmen-tedwith(denotedby118IEEETRANSACTIONSONKNOWLEDGEANDDATAENGINEERING,VOL.21,NO.1,JANUARY20093.AdaptationsofrankingalgorithmsforRNNqueriesareomittedduetospaceconstraints. Authorized licensed use limited to: University of Minnesota. Downloaded on January 14, 2010 at 07:02 from IEEE Xplore. Restrictions apply. a2DR-tree,whereeachnonleafentryisaugmen-tedwith(i.e.,thenumberofobjectsinthesubtreeindexedbyit),denotedbyAUGCOUNTa3DR-tree(denotedby),anda3DR-tree,whereeachnonleafentryisaugmen-tedwith(denotedby3DCOUNTForindexes4and5,all(spatial/probability)dimensionsarenormalizedtothesamedomaininterval.Notethatindex1capturesminimuminformationinnonleafentriesandoccupiestheleastspace,whereasindex5isattheotherend(entriescapturemaximuminformationandtheindexoccupiesthemostspace).AllalgorithmswereimplementedinC++.ExperimentswererunonaPCwithaPentiumDCPUof2.8GHz.Thepagesizeofindexeswassetto1Kbyte;therelativeperformanceresultsoftheabovemethodswereobservedforotherpagesizes(upto8Kbytes).Nomemorybuffersareusedforcachingdiskpagesbetweendifferentqueries;thenumberofnodeaccessesdirectlyreflectstheI/Ocost.Ineachexperiment,themeasuredI/OcostistheaverageI/Ocostof100querieswiththesameparametervalues(butwithdifferentlocationsrandomlychosenfromthedataset).Forrangequeries,NNsearch,andRNNretrieval,theI/Otimeisover90percentofthetotalexecutioncostsotheCPUtimeisnotreported.6.1DescriptionofDataForourexperiments,weusedvariousrealdatasetsofdifferentsizesandobjectdistributions,describedinTable4.ThedatasetsTGandSFareobtainedfrom[19],whiletheotherdatasetsareobtainedfromtheR-treePortalDuetothelackofarealspatialdatasetwithobjectshavingexistentialprobabilities,wegeneratedprobabilitiesfortheobjects,usingthefollowingmethodology.First,wepointsrandomlyonthemap,followingthedatadistribution.Thesepointsmodellocationsaroundwhichthereislargecertaintyfortheexistenceofdata(e.g.,theycouldbeantennasofreceiversclosetowhichinformationisaccurate).Foreachpointthedataset,we1)findtheclosestanchorand2)assignanexistentialprobabilityproportionalto .Thus,thedistributionofprobabilitiesaroundtheanchorsisaZipfianone.Theprobabilitiesarenormalized(usingwithrespecttothemaximumprobability()correspond-ingtotheanchorpoint.Thedefaultskewvalueisexperimentsondifferentskewvaluescanbefoundinourpreliminarywork[20].6.2ExperimentalResultsTable4showstheperformancesofthefiveindexesforthresholdingandrankingNNqueriesondifferentdatasets.WefixforthresholdingNNqueriesandrankingNNqueries.Observethattheaugmentedand3DR-treesperformbetterthanthe2DR-treeeventhoughtheyarelargerinsize.Algorithms4and5managetoprunealargenumberofnodesthatdonotcontainqueryresults,whichareotherwisevisitedinthesimple2DR-treeindex.Thecostof2DR-treevariants(i.e.,methods)doesnotchangemuchwiththedatabasesize.TheI/Ocostsof3DR-treevariantsincreaseslowlyasthedatabasesizeincreases.Thisisduetothefactthat3DR-treesgroupentriesusingbothspatialandprobabilitydimensions,butthequeryalgorithmsmainlysearchforobjectsbasedonspatialdimensions.Insubsequentexperiments,wecomparetheperformanceoftheindexesontheSFdatasetanddefaultparametervaluesareandforthresholdingandrankingqueries,respectively.Fig.8showstheI/Operformanceoftheindexesforthresholdingandrankingqueries.Augmentedand3DR-treesperformmuchbetterthanthesimple2DR-treeforalltestedvaluesof,lessthanfiveaccessesarerequiredtofindthequeryresultwhenusingthefouradvancedindexesandAlgorithms4and5.Whencomparingtheseindexes,weobservethataugmentingisnotagoodidea;usingthegivesaccurateenoughestimationsof.Thus,theextraspace(translatedtoextraaccesses)requiredforaugmentingdoesnotpayoff.Inaddition,theaugmentedR-treeperformsbetterthanthe3DR-tree.First,the3DR-treeoccupiesmorespace(thecapacityofeachnonleafnodeissmaller)andresultsinmoreaccesses,sincetheextraspaceisnotcompensatedbytighter(see(3)and(4)).Second,sincethe3DR-treegroupsentriestonodesusingtheexistentialprobabilitiesaswellasspatialdimensions,itdoesnotachieveasgoodpartitioningastheoneusingthespatialdimensionsonly;however,searchisperformedprimarilyusingthespatialdimensions.Next,weexaminetheperformancesofrangequeriesontheindexes.Theparameterdenotestheextentofthe YIUETAL.:EFFICIENTEVALUATIONOFPROBABILISTICADVANCEDSPATIALQUERIESONEXISTENTIALLYUNCERTAINDATA TABLE4I/OCostofThresholding/RankingNNonDifferentDataSets, 4.Asmallvalueforisnecessaryinordertoobservedifferencebetweentheindexes.Largervaluesforwillbetestedinasubsequentexperiment. Fig.8.NNqueriesontheSFdataset,.(a)Thresholdingqueries.(b)Rankingqueries. Authorized licensed use limited to: University of Minnesota. Downloaded on January 14, 2010 at 07:02 from IEEE Xplore. Restrictions apply. querywindow(ineachdimension),whosedefaultvalueissetto5percentofthedomainlength.Figs.9aand9bshowthecostofthresholdingandrankingqueriesasafunctionof,respectively.Exceptforthesimple2DR-tree,allindexesfollowsimilartrendsasinprobabilisticNNqueries.Thecostofrangequeriesonthe2DR-treeisindependentofasallpointswithinthespatialrangeareretrieved.Observethatforverysmall,theaugmentedand3Dindexesmayperformworsethanthe2DR-treebecause1)theyprunenoorveryfewdirectoryentriesthathaveand2)theyarelargerinsizethanthesimple2DR-tree.Similarly,decreaseswith,affectingthecostsoftheadvancedmethods.The3DR-treeperformsworsethantheaugmented2DR-treealsoforrangequeries.Fig.9cshowsthecostofthresholdingqueriesasafunction.ThecostsofallmethodsincreasewithWeproceedtocomparetheperformancesofSSqueriesontheindexes.Foreachquery,asetofquerypointsarerandomlygeneratedinaquerywindowwithsidelengthsuchthatthewindowfollowsthedatadistribution.Thedefaultvaluesofare6percentand5percentofthedomainrange,respectively.Fig.10showstheI/O-CPUtimebreakdownofthresholdingandrankingqueriesasafunctionof,respectively.Eachpagefaultischarged10msofI/Otime.Observethatthemethodformsitscompetitorsforawiderangeofparameters.IntermsofI/O,thetrendsaresimilartotheonesinFig.8.However,theCPUtimeofaugmentedand3DtreesbecomeshighatlowvalueandhighFig.11plotsthecostoftheindexesbyvaryingtheofquerypoints.Ingeneral,whenapointisspatiallydominatedbyfewerpoints,andthustheprobabilityofthepointtobeintheskylineincreases.Thus,morepointsneedtobeexaminedbythresholdingqueriesanditsI/Ocostincreasesrapidly.Ontheotherhand,increaseswith,strengtheningthepruningpowerofadvancedindexes.Thus,thecostofrankingqueriesincreasesataslowerrate.Finally,westudytheperformanceoftheindexesforRNNqueries.Fig.12showstheeffectofthenumberofsectorsinperformance.Whenmoresectorsareused,tighterprobabilityboundsarederivedforthesectors,andhence,thealgorithmterminatesfaster.Inparticular,the96-sectorpartitioningachievessubstantialcostreduction(overthebasic6-sectorpartitioning)forthresholdingandrankingqueries,respectively.Observethatthecoststartsconver-gingtoitsfinalvaluewithasfewas24partitions.Fig.13120IEEETRANSACTIONSONKNOWLEDGEANDDATAENGINEERING,VOL.21,NO.1,JANUARY2009 Fig.9.RangequeriesontheSFdataset.(a)Thresholdingqueriesversus.(b)Rankingqueriesversus.(c)Thresholdingqueriesversus Fig.11.SSqueriesontheSFdataset,varying.(a)Thresholding.(b)Rankingqueries, Fig.10.SSqueriesontheSFdataset,5percent(a)Thresholdingqueriesversus.(b)Rankingqueriesversus Fig.13.RNNqueriesontheSFdataset,usingthe24-sectorpartitioning.(a)Thresholdingqueriesversus.(b)Rankingqueries Fig.12.RNNqueriesontheSFdataset,varyingthenumberofsectors.(a)Thresholdingqueries,.(b)Rankingqueries, Authorized licensed use limited to: University of Minnesota. Downloaded on January 14, 2010 at 07:02 from IEEE Xplore. Restrictions apply. plotsthecostofthemethodsasafunctionofrespectively,whenusingthe24-sectorpartitioning.Forthresholdingqueries,theperformancegapbetweenthe2DR-treeandotherindexeswidensasincreasesbecauseoftheincreasedpruningpoweroftheadvancedindexes.Ontheotherhand,thecostdifferencesamongtheindexesarenotsensitivetothevalueof.Aswithpreviousqueries,ELAXINGTHEOuranalysissofarassumesthattheexistentialprobabilitiesofobjectsareindependent.Thisassumptionisvalidinalargenumberofapplications(e.g.,thosementionedinSection1);hence,oursolutionshavesignificantvalueinpractice.However,therearealsootherapplicationswheretheexistentialprobabilitiesofdifferentobjectsarecorre-lated.Forexample,consideracollectionofsensorsdistributedinaforestfordetectingwildfire.Whenasensordetectssmoke,sensorsinitsneighborhoodarelikelytosenseitaswell.Athoroughsolutioninthisscenariofallsoutofthescopeofthispaper.Nevertheless,inthesequel,wepointoutthedirectiontowardextendingtheproposedalgorithmsandindexingschemestosupportcorrelatedexistentialprobabilities.WenowelaboratehowtoevaluatethethresholdingprobabilisticNNqueryusingasimple2DR-tree,inthecorrelatedprobabilitymodel.InsteadofDefinition2,wedefinetheprobabilityofanobjecttobetheNNofq;xq;x,wheredenotesthejointprobabilityexists(i.e.,theevent)andallobjectsclosertodonotexist.ThisprobabilitycanbecomputedfromaBayesiannetworkmodelingdependentprobabilitiesamongtheobjects.Thevalueq;xq;xservesasanupperboundof,regardlessofhowtheprobabilitiesarecorrelated.Basedonthisproperty,wemodifyAlgo-rithm2asfollows:1)wemaintainthesetofvisitedobjects,2)atline7,weinserttheobjectfirst,and3)atline4,wecompute.TheaboveideaworksalsoforSS((7),Algorithm6)andRNN((14),Algorithm8),afterreplacingeachmultiplicationby,each,each,andthefinalprobabilitybyExtensionsofotherR-treesolutions(e.g.,2Daugmentedtreesand3Dtrees)generatenontrivialresearchissues,duetothefactthat:1)thenumberofpossiblejointprobabilitiesisenormous(i.e.,exponentialtothedatacardinality)and2)itremainsunclearhowtoaugmentanonleafentrytoeffectivelycapturethejointprobabilitiesoftheobjectsinitssubtree.Inthefuture,wewilldevelopefficientsolutionsforaugmentedtreesthatareapplicableforthecorrelatedprobabilitymodel.Inthispaper,wehavepresentedtheinterestingproblemofevaluatingspatialqueriesforexistentiallyuncertaindata.Variantsofcommonspatialqueries,likerangeandNNsearch,haveprobabilisticversionsforthisdatamodel.Weproposedalgorithmsfortheseprobabilisticversionsandseveralextensionsofspatialaccessmethods(i.e.,R-trees)wherethesealgorithmsareapplied.Inaddition,wediscusshowcomplexspatialqueriessuchasSSqueriesandRNNqueriescanbeprocessedinourframework.Finally,weconductedextensiveexperimentstoevaluatethesearchalgorithmsandthecorrespondingspatialindexes.Inmostofthetestedcases,thedatastructurethatperformsbestisaR-tree,wherenonleafentriesareaugmentedwithmax-imumexistentialprobabilitiesofthesubtreetheypointat.Inthefuture,weplantostudyindetailmoreadvancedquerytypesandextendourmethodstoapplyondatathatarebothexistentiallyandlocationallyuncertain,aswellasresultsoffuzzyclassifiers[1].ThisworkwassupportedbyGrantHKU7149/07EfromHongKongRGC.TheworkofYufeiTaowassupportedbyGrantsCUHK1202/06andCUHK4161/07fromHongKongRGC.Apreliminaryversionofthisworkappearedin[20].[20].AdvancesinRemoteSensingandGISAnalysis,P.M.AtkinsonandN.J.Tate,eds.JohnWiley&Sons,1999.1999.O.Wolfson,A.P.Sistla,S.Chamberlain,andY.Yesha,“UpdatingandQueryingDatabasesthatTrackMobileUnits,”DistributedandParallelDatabases,vol.7,no.3,pp.257-387,1999.1999.D.PfoserandC.S.Jensen,“CapturingtheUncertaintyofMoving-ObjectRepresentations,”Proc.SixthInt’lSymp.SpatialDatabasesDatabasesJ.Ni,C.V.Ravishankar,andB.Bhanu,“ProbabilisticSpatialDatabaseOperations,”Proc.EighthInt’lSymp.SpatialandTemporalDatabases(SSTD),(SSTD),X.YuandS.Mehrotra,“CapturingUncertaintyinSpatialQueriesoverImpreciseData,”Proc.14thInt’lConf.DatabaseandExpertSystemsApplications(DEXA),(DEXA),R.Cheng,D.V.Kalashnikov,andS.Prabhakar,“QueryingImpreciseDatainMovingObjectEnvironments,”IEEETrans.KnowledgeandDataEng.,vol.16,no.9,pp.1112-1127,Sept.2004.2004.Y.Tao,X.Xiao,andR.Cheng,“RangeSearchonMulti-dimensionalUncertainData,”ACMTrans.DatabaseSystems,vol.32,no.3,p.15,2007.2007.A.Guttman,“R-Trees:ADynamicIndexStructureforSpatialSearching,”Proc.ACMSIGMOD,SIGMOD,G.R.HjaltasonandH.Samet,“DistanceBrowsinginSpatialACMTrans.DatabaseSystems,vol.24,no.2,pp.265-318,1999.1999.R.Cheng,Y.Xia,S.Prabhakar,R.Shah,andJ.S.Vitter,“EfficientIndexingMethodsforProbabilisticThresholdQueriesoverUncertainData,”Proc.30thInt’lConf.VeryLargeDataBasesBasesG.Trajcevski,O.Wolfson,F.Zhang,andS.Chamberlain,“TheGeometryofUncertaintyinMovingObjectsDatabases,”EighthInt’lConf.ExtendingDatabaseTechnology(EDBT),(EDBT),R.Cheng,D.V.Kalashnikov,andS.Prabhakar,“EvaluatingProbabilisticQueriesoverImpreciseData,”Proc.ACMSIGMOD,SIGMOD,I.LazaridisandS.Mehrotra,“ApproximateSelectionQueriesoverImpreciseData,”Proc.20thInt’lConf.DataEng.(ICDE),(ICDE),N.Beckmann,H.-P.Kriegel,R.Schneider,andB.Seeger,“The:AnEfficientandRobustAccessMethodforPointsandProc.ACMSIGMOD,,M.SharifzadehandC.Shahabi,“TheSpatialSkylineQueries,”Proc.32ndInt’lConf.VeryLargeDataBases(VLDB),(VLDB),I.Stanoi,D.Agrawal,andA.Abbadi,“ReverseNearestNeighborQueriesforDynamicDatabases,”Proc.ACMSIGMODWorkshopResearchIssuesinDataMiningandKnowledgeDiscovery(DMKD), YIUETAL.:EFFICIENTEVALUATIONOFPROBABILISTICADVANCEDSPATIALQUERIESONEXISTENTIALLYUNCERTAINDATA Authorized licensed use limited to: University of Minnesota. Downloaded on January 14, 2010 at 07:02 from IEEE Xplore. Restrictions apply. D.Papadias,Y.Tao,G.Fu,andB.Seeger,“ProgressiveSkylineComputationinDatabaseSystems,”ACMTrans.DatabaseSystems,vol.30,no.1,pp.41-82,2005.2005.Y.Tao,D.Papadias,andX.Lian,“ReversekNNSearchinArbitraryDimensionality,”Proc.30thInt’lConf.VeryLargeDataBases(VLDB),(VLDB),T.Brinkhoff,“AFrameworkforGeneratingNetwork-BasedMovingObjects,”GeoInformatica,vol.6,no.2,pp.153-180,2002.2002.X.Dai,M.L.Yiu,N.Mamoulis,Y.Tao,andM.Vaitis,“ProbabilisticSpatialQueriesonExistentiallyUncertainData,”Proc.NinthInt’lSymp.SpatialandTemporalDatabases(SSTD),ManLungYiureceivedthebachelor’sdegreeincomputerengineeringandthePhDdegreeincomputersciencefromtheUniversityofHongKongin2002and2006,respectively.HeiscurrentlyanassistantprofessorintheDepart-mentofComputerScience,AalborgUniversity,Aalborg,Denmark.Hisresearchinterestsincludethemanagementofcomplexdata,inparticularthequeryprocessingtopicsonspatiotemporaldataandmultidimensionaldata.NikosMamoulisreceivedthediplomaincom-puterengineeringandinformaticsfromtheUniversityofPatras,Greece,in1995andthePhDdegreeincomputersciencefromtheHongKongUniversityofScienceandTechnologyin2000.HeiscurrentlyanassociateprofessorintheDepartmentofComputerScience,UniversityofHongKong,whichhejoinedin2001.Inthepast,hehasworkedasaresearchanddevel-opmentengineerintheComputerTechnologyInstitute,Patras,andasapostdoctoralresearcherattheCentrumvoorWiskundeenInformatica(CWI),TheNetherlands.Hisresearchinterestsincludemanagementandminingofcomplexdatatypes.Hehasservedontheprogramcommitteesofmorethan40internationalconferencesandworkshopsondatamanagementanddatamining.HewasthegeneralchairofSSDBM2008andacoorganizerofSSTDM2006.HeisaneditorialboardmemberoftheGeoinformaticaJournalandafieldeditoroftheEncyclopediaofGeographicInformationSystemsXiangyuanDaireceivedthebachelorofen-gineeringdegreefromtheUniversityofScienceandTechnologyofChinain2004andtheMPhildegreeincomputersciencefromtheUniversityofHongKong,in2006.HeiscurrentlywiththeDepartmentofComputerScience,UniversityofHongKong.Hisresearchinterestsincludequeryprocessingproblemsonspatialdata.YufeiTaoisengagedinresearchofdatabasesystems.Heisparticularlyinterestedinindexstructuresandqueryalgorithmsonmultidimen-sionaldata,andhaspublishedprimarilyontemporaldatabases,spatialdatabases,andprivacypreservation.HereceivedtheHongKongyoungscientistawardin2002.HehasservedontheprogramcommitteesofmostprestigiousdatabaseconferencessuchasSIGMOD,VLDB,ICDE,andiscurrentlyanassociateeditorofTransactionsonDatabaseSystems(TODS).HejoinedtheDepartmentofComputerScienceandEngineering,ChineseUniversityofHongKong,HongKonginSeptember2006.Beforethat,heheldpositionsattheCarnegieMellonUniversityandtheCityUniversityofHongKong.HeisamemberoftheACM.MichailVaitisreceivedtheengineeringdiplomaandthePhDdegreeincomputerengineeringandinformaticsfromtheUniversityofPatras,Greece,in1992and2001,respectively.Since2003,hehasbeenafacultymemberintheDepartmentofGeography,UniversityoftheAegean,Mytilene,Greece,whereheiscurrentlyanassistantprofessor.Inthepast,heworkedfor5yearsattheResearchAcademicComputerTechnologyInstitute(RA-CTI),Greece,onhypertextanddatabasesystems.Hisresearchinterestsincludegeographicaldatabases,spatialdatainfrastructures,geographichyper-media,andgeo-spatialsemanticweb.HeisamemberoftheACMandtheTechnicalChamberofGreece.Formoreinformationonthisoranyothercomputingtopic,pleasevisitourDigitalLibraryatwww.computer.org/publications/dlib.122IEEETRANSACTIONSONKNOWLEDGEANDDATAENGINEERING,VOL.21,NO.1,JANUARY2009 Authorized licensed use limited to: University of Minnesota. Downloaded on January 14, 2010 at 07:02 from IEEE Xplore. Restrictions apply.