/
Probabilistic Data  Management Probabilistic Data  Management

Probabilistic Data Management - PowerPoint Presentation

bubbleba
bubbleba . @bubbleba
Follow
351 views
Uploaded On 2020-08-29

Probabilistic Data Management - PPT Presentation

Chapter 5 Probabilistic Query Answering 3 2 Objectives In this chapter you will Learn the definition and query processing techniques of a probabilistic query type Probabilistic Reverse Nearest Neighbor Query ID: 811175

probabilistic query nearest prnn query probabilistic prnn nearest data rnn pruning uncertain neighbor reverse candidate objects object region queries

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Probabilistic Data Management" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Probabilistic Data Management

Chapter 5: Probabilistic Query Answering (3)

Slide2

2Objectives

In this chapter, you will:

Learn the definition and query processing techniques of a probabilistic query type

Probabilistic Reverse Nearest Neighbor Query

Slide3

3

Recall: Probabilistic Query Types

Uncertain/probabilistic database

Probabilistic range

query

Probabilistic

k

-nearest neighbor queryProbabilistic group nearest neighbor (PGNN) queryProbabilistic reverse k-nearest neighbor queryProbabilistic spatial join /similarity joinProbabilistic top-k query (or ranked query) Probabilistic skyline queryProbabilistic reverse skyline query

Probabilistic Preference Query

Probabilistic Spatial Query

Slide4

Probabilistic Reverse Nearest Neighbor Queries in

Uncertain

Databases

Very Large Data Bases Journal

(VLDBJ), 2009

Slide5

5Outline

Introduction

Related Work

Problem Definition

PRNN Query Processing

Experimental Evaluation

Summary

Slide6

6Reverse Nearest Neighbor Query (RNN)

Rescue tasks in oceans

In the case of emergency, a ship will ask its nearest ship for help

A rescue ship needs to monitor those ships that have itself as their nearest neighbors

In other words, the rescue ship needs to obtain its

reverse nearest neighbors

(RNNs)

Slide7

7

Introduction

Reverse Nearest Neighbor Query (RNN)

Given a database

D

and a query object

q

, a RNN query retrieves those data objects o D that have q as nearest neighborqo1o2o3

o4o5

Slide8

8

RNN Processing on Certain Data Points

TPL Approach [VLDB'04]

q

o

4

o

5

o1o2

o3

RNN candidate

pruning region

Slide9

9

RNN Processing on Certain Data Points

TPL Approach [VLDB'04]

q

o

4

o

5

o1o2

o3

RNN candidate

RNN

candidate

pruning region

Slide10

10Probabilistic Reverse Nearest Neighbor Query (PRNN)

Due to the accuracy of positioning devices (e.g. GPS) or their movement, the reported positions of ships are imprecise

Therefore, it is important to answer RNN queries over uncertain data

effectively

and

efficiently

Slide11

11Other Application of PRNN

Mixed-reality game

Each player tend to shoot his/her nearest neighbor

A query player needs to monitor those players (RNNs) who have himself/herself as their nearest neighbors

Due to movement of players, positions of players can be imprecise and uncertain, and RNN is conducted on uncertain objects

Slide12

12RNN Queries in Uncertain Databases

Slide13

PRNN DefinitionProbabilistic Reverse Nearest Neighbor (PRNN) Queries

13

Slide14

A Straightforward MethodFor every uncertain object

o

in the database

Sequentially scan all the objects in the database

Calculate the PRNN probability,

P

PRNN

(q, o), that o is an RNN of qIf PPRNN (q, o) is greater than or equal to probabilistic threshold a, then o is the answer; otherwise, o is discardedAnalysisComplexity: O(N

2), where N is the database sizeThe computation of probability P

PRNN (q, o) is very costly14

Slide15

15Pruning Techniques

Geometric Pruning (GP)

GP

0

method

The object distribution in the uncertainty region can be either known or unknown

Prune those data objects that definitely cannot be RNN of

qGPb method (b  (0, 1])The object distribution in uncertainty region is known and the pre-computation is allowedPrune those objects with the PRNN probability smaller than b

Slide16

Heuristics of GP0 Method

Data objects always reside within

uncertainty regions

16

conservative pruning region

(CPR)

Slide17

17

Heuristics of GP

0

Method (cont.)

no false dismissals are introduced with hypersphere approximation

candidate o

Slide18

18Conditions of GP0

Method

Pruning Conditions

dist

(

P

,

q) - dist(P, Co) > romindist(P, D)  rp In other words, if object p is fully contained in the pruning region CPR'(q, o), then p can be safely pruned

Slide19

GP

b

prunes those objects with the PRNN probability smaller than

b

(<

a

)Heuristics of GPb Method (b  (0, 1])19

p can be pruned by

GPbcandidate o

Slide20

Refinement Phase

After applying geometric pruning methods, we can obtain a candidate set

For each candidate

o

, we retrieve those uncertain objects

p'

intersecting with PR and compute the probability that

o is an RNN of q20

Slide21

PRNN Query ProcessingMaintain a multidimensional index structure

over uncertain database

// indexing phase

For each PRNN query

Apply geometric pruning methods during the index traversal

// pruning phaseRefine candidates and return the answer set // refinement phase21

Slide22

22PRNN Query Processing

Index uncertain data with an R-tree

Slide23

23PRNN Query Procedure

Traverse the R-tree index by maintaining a

minimum heap

(with key the minimum distance from query point to node)

For each node/object

N

i

we encounterCheck whether or not Ni can be pruned by GP methodsIf the answer is no, then we either further check the children of node Ni, or add it to a PRNN candidate set Scand in case Ni is an objectAfter the index traversal, we refine candidates in Scand by calculating their actual PRNN probabilities

Slide24

24PRNN Query Processing (cont'd)

Slide25

Experimental EvaluationExperimental Settings

Real data sets:

LB

,

MG

,

TCB

, and CARSynthetic data sets:Generate center location Co of uncertain object o in a data space [0, 1,000]dProduce radius ro  [rmin, rmax] for uncertainty region UR(o) Four types of data sets: lUrU

, lUrG, lSrU, and lSrG

Competitors:Linear scan (worse than ours by 5-9 orders of magnitude)Naïve pruning (pruning condition: given a PRNN candidate o, a node/object e can be pruned if maxdist(o

, e) < mindist(q, e

))

25

Slide26

Performance vs. b

26

data size N =

100K,

dimensionality d

= 3,

radius range

[rmin, rmax

] = [0, 5], and probabilistic threshold a = 1

Slide27

SummaryWe formulate the problem of probabilistic queries over uncertain databases

We propose effective pruning methods to reduce the search space of probabilistic queries

We integrate pruning methods into an efficient query procedure

We verify the efficiency of our proposed approaches through extensive experiments

27