Chapter 5 Probabilistic Query Answering 3 2 Objectives In this chapter you will Learn the definition and query processing techniques of a probabilistic query type Probabilistic Reverse Nearest Neighbor Query ID: 811175
Download The PPT/PDF document "Probabilistic Data Management" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Probabilistic Data Management
Chapter 5: Probabilistic Query Answering (3)
Slide22Objectives
In this chapter, you will:
Learn the definition and query processing techniques of a probabilistic query type
Probabilistic Reverse Nearest Neighbor Query
Slide33
Recall: Probabilistic Query Types
Uncertain/probabilistic database
Probabilistic range
query
Probabilistic
k
-nearest neighbor queryProbabilistic group nearest neighbor (PGNN) queryProbabilistic reverse k-nearest neighbor queryProbabilistic spatial join /similarity joinProbabilistic top-k query (or ranked query) Probabilistic skyline queryProbabilistic reverse skyline query
Probabilistic Preference Query
Probabilistic Spatial Query
Slide4Probabilistic Reverse Nearest Neighbor Queries in
Uncertain
Databases
Very Large Data Bases Journal
(VLDBJ), 2009
Slide55Outline
Introduction
Related Work
Problem Definition
PRNN Query Processing
Experimental Evaluation
Summary
Slide66Reverse Nearest Neighbor Query (RNN)
Rescue tasks in oceans
In the case of emergency, a ship will ask its nearest ship for help
A rescue ship needs to monitor those ships that have itself as their nearest neighbors
In other words, the rescue ship needs to obtain its
reverse nearest neighbors
(RNNs)
Slide77
Introduction
Reverse Nearest Neighbor Query (RNN)
Given a database
D
and a query object
q
, a RNN query retrieves those data objects o D that have q as nearest neighborqo1o2o3
o4o5
Slide88
RNN Processing on Certain Data Points
TPL Approach [VLDB'04]
q
o
4
o
5
o1o2
o3
RNN candidate
pruning region
Slide99
RNN Processing on Certain Data Points
TPL Approach [VLDB'04]
q
o
4
o
5
o1o2
o3
RNN candidate
RNN
candidate
pruning region
Slide1010Probabilistic Reverse Nearest Neighbor Query (PRNN)
Due to the accuracy of positioning devices (e.g. GPS) or their movement, the reported positions of ships are imprecise
Therefore, it is important to answer RNN queries over uncertain data
effectively
and
efficiently
Slide1111Other Application of PRNN
Mixed-reality game
Each player tend to shoot his/her nearest neighbor
A query player needs to monitor those players (RNNs) who have himself/herself as their nearest neighbors
Due to movement of players, positions of players can be imprecise and uncertain, and RNN is conducted on uncertain objects
Slide1212RNN Queries in Uncertain Databases
Slide13PRNN DefinitionProbabilistic Reverse Nearest Neighbor (PRNN) Queries
13
Slide14A Straightforward MethodFor every uncertain object
o
in the database
Sequentially scan all the objects in the database
Calculate the PRNN probability,
P
PRNN
(q, o), that o is an RNN of qIf PPRNN (q, o) is greater than or equal to probabilistic threshold a, then o is the answer; otherwise, o is discardedAnalysisComplexity: O(N
2), where N is the database sizeThe computation of probability P
PRNN (q, o) is very costly14
Slide1515Pruning Techniques
Geometric Pruning (GP)
GP
0
method
The object distribution in the uncertainty region can be either known or unknown
Prune those data objects that definitely cannot be RNN of
qGPb method (b (0, 1])The object distribution in uncertainty region is known and the pre-computation is allowedPrune those objects with the PRNN probability smaller than b
Slide16Heuristics of GP0 Method
Data objects always reside within
uncertainty regions
16
conservative pruning region
(CPR)
Slide1717
Heuristics of GP
0
Method (cont.)
no false dismissals are introduced with hypersphere approximation
candidate o
Slide1818Conditions of GP0
Method
Pruning Conditions
dist
(
P
,
q) - dist(P, Co) > romindist(P, D) rp In other words, if object p is fully contained in the pruning region CPR'(q, o), then p can be safely pruned
Slide19GP
b
prunes those objects with the PRNN probability smaller than
b
(<
a
)Heuristics of GPb Method (b (0, 1])19
p can be pruned by
GPbcandidate o
Slide20Refinement Phase
After applying geometric pruning methods, we can obtain a candidate set
For each candidate
o
, we retrieve those uncertain objects
p'
intersecting with PR and compute the probability that
o is an RNN of q20
Slide21PRNN Query ProcessingMaintain a multidimensional index structure
over uncertain database
// indexing phase
For each PRNN query
Apply geometric pruning methods during the index traversal
// pruning phaseRefine candidates and return the answer set // refinement phase21
Slide2222PRNN Query Processing
Index uncertain data with an R-tree
Slide2323PRNN Query Procedure
Traverse the R-tree index by maintaining a
minimum heap
(with key the minimum distance from query point to node)
For each node/object
N
i
we encounterCheck whether or not Ni can be pruned by GP methodsIf the answer is no, then we either further check the children of node Ni, or add it to a PRNN candidate set Scand in case Ni is an objectAfter the index traversal, we refine candidates in Scand by calculating their actual PRNN probabilities
Slide2424PRNN Query Processing (cont'd)
Slide25Experimental EvaluationExperimental Settings
Real data sets:
LB
,
MG
,
TCB
, and CARSynthetic data sets:Generate center location Co of uncertain object o in a data space [0, 1,000]dProduce radius ro [rmin, rmax] for uncertainty region UR(o) Four types of data sets: lUrU
, lUrG, lSrU, and lSrG
Competitors:Linear scan (worse than ours by 5-9 orders of magnitude)Naïve pruning (pruning condition: given a PRNN candidate o, a node/object e can be pruned if maxdist(o
, e) < mindist(q, e
))
25
Slide26Performance vs. b
26
data size N =
100K,
dimensionality d
= 3,
radius range
[rmin, rmax
] = [0, 5], and probabilistic threshold a = 1
Slide27SummaryWe formulate the problem of probabilistic queries over uncertain databases
We propose effective pruning methods to reduce the search space of probabilistic queries
We integrate pruning methods into an efficient query procedure
We verify the efficiency of our proposed approaches through extensive experiments
27