Processing Transitive Nearest-Neighbor Queries in Multi-Cha

Processing Transitive Nearest-Neighbor Queries in Multi-Cha Processing Transitive Nearest-Neighbor Queries in Multi-Cha - Start

2017-06-27 64K 64 0 0

Download Presentation

Processing Transitive Nearest-Neighbor Queries in Multi-Cha




Download Presentation - The PPT/PDF document "Processing Transitive Nearest-Neighbor Q..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.



Presentations text content in Processing Transitive Nearest-Neighbor Queries in Multi-Cha

Slide1

Processing Transitive Nearest-Neighbor Queries in Multi-Channel Access Environments

Xiao Zhang1, Wang-Chien Lee1, Prasenjit Mitra1, 2, Baihua Zheng31 Department of Computer Science and Engineering2 College of Information Science and TechnologyThe Pennsylvania State University3 School of Information Systems, Singapore Management University

EDBT, Nantes, France, 03/28/2008

Slide2

BackgroundProblem AnalysisNew TNN AlgorithmsOptimizationExperimentsConclusions & Future Work

Outline

Slide3

What is TNN?S is a set of banksR is a set of restaurantsTNN distance = 5+1 = 6

Background – TNN

Slide4

What is TNN?Given a query point p and two datasets S and R, TNN returns a pair of objects (s, r) such that ∀(s’, r’)∈S×R, dis(p, s) + dis(s, r) ≤ dis(p, s’) + dis(s’, r’) where dis(p,s) is the Euclidean distance between p and s.

Background – TNN

First proposed by Zheng, Lee and Lee [1].

[1] B. Zheng,

K.C.Lee

and

W.-C.Lee

. Transitive nearest neighbor search in mobile environments.

SUTC

2006

Slide5

Server has all the data and broadcasts data in forms of radio signals in channels.Mobile clients (cell phones and PDAs) tune in to broadcast channels, download necessary data and process queries.

Background - broadcast

Broadcast VS. on-demandSupport an arbitrary number of mobile devices to have simultaneous accessEfficient use of limited bandwidthLight workload on the server side

Slide6

Assumption:Zheng, Lee and Lee assumed a single broadcast channel.Based on existing technology (dual-mode, dual-standby cell phone), we assume multiple channels.A mobile client can access information in multiple channels simultaneouslyChallenges:How to utilize the parallel processing ability of mobile clients to facilitate query processing?How to reduce access time?How to reduce energy consumption?

Background - motivation

Slide7

1. We developed two new algorithms for TNN query in multi-channel access environment.2. We proposed two new distance metrics (MinTransDist and MinMaxTransDist) so that our new algorithms efficiently reduce search cost.3. We proposed an optimization technique to reduce energy consumption.

Our contributions:

Slide8

1. Two broadcast channels, for S and R2. 2-dim points3. Air-indexing: R-tree[2]4. Broadcast in depth-first order, in order to avoid back-tracking5. (1, m) interleaving [3]6. performance metrics (in # of pages): Access timeTune-in time

Background – settings

[2] A.

Guttman

. R-trees: a dynamic index structure for spatial searching. in

Sigmod

’84

[3]

T.Imielinski

,

S.Viswanathan

, and

B.Badrinath

. Data on air: organization and access.

TKDE

1997

Slide9

Problem Analysis

Randomly choose

ANY

pair of objects (

s’, r’

), use the trans. dist. as a search range

Guarantee to enclose the answer pair (

s, r

)

Slide10

Theorem[1]: the transitive distance determined by any pair of objects (s, r) is an upper bound.General ideas of answering TNN queries:Estimate: find a search range from the query point p by searching the indexFilter: filter unqualified data objects in the search range determined earlier to find the pair of objects with minimum transitive distance.

Problem Analysis

Slide11

Deficiencies of existing algorithms:Approximate-TNN-Search:Uses an equation to estimate the search range in the first stepSearch range may be too large or too smallWindow-Based-TNN-Search:Two sequential NN searches in estimation stepSearch range estimation is done in sequential orderLarge access time

Problem Analysis

Slide12

Algo 1: Double-NN-SearchIssue two NN queries in estimation stepp’s NN in S, and p’s NN in R(s1, r2)

New TNN algorithms – algo1

Slide13

Hybrid-NN-SearchIncreases interaction between two channelsUses result of the finished NN to guide the unfinished NN in order to reduce search rangeUses new distance metrics to perform branch-and-boundTreat TNN distance as a whole

New TNN Algorithms – algo2

Slide14

NN in Channel 1 finishes firstAlready found s=p.NN(S)Looking for r2, instead of r1

New TNN Algorithms – algo 2

Slide15

NN in channel 2 finishes firstAlready found r=p.NN(R)Looking for s2 instead of s1Use new criteria when searching the indexNeed new distance metrics for branch&bound

New TNN Algorithms – algo 2

Slide16

MinTransDist: Lower bound for trans. dist. from p to an MBR to r.MinMaxTransDist:Upper bound for trans. dist. from p to an MBR to r.Details given in the paper.

New TNN Algorithms –

algo

2

Slide17

Algorithm description:If the two NN searches in both channels are not finished, follow the Double-NN algorithmIf the NN search in Channel 1 (Dataset S) finishes first, let s=p.NN(S), use s as the new query point and perform NN on the remaining portion of R-tree for dataset R. If the NN search in Channel 2 (Dataset R) finishes first, change distance metrics, use MinTransDist and MinMaxTransDist to perform branch-and-bound. Find an s which can minimize the transitive distance.

New TNN Algorithms -

Hybird

Slide18

Updating and pruning strategyUse queue to keep potential MBRs, sorted based on their arrival timeCase 2 (s=p.NN(S) finishes first):Switch NN query point to the sInitial upper bound updateIf there is an intermediate result r’, update the upper bound with dis(p, s)+dis(s, r’ )Scan the queue of MBRs and use dist. metr. in traditional NN queries.

New TNN Algorithms - Hybrid

Slide19

Updating and pruning strategy (cont.)Case 3 (r=p.NN(R) finishes first):If there is an intermediate result s’, use dis(p, s’)+dis(s’, r) as the new upper boundThen scan all the MBRs in the queue, use z=minMi∈MBR_queue{MinMaxTransDist(p, Mi, r)} to update the upper bound.In traversal, use MinMaxTransDist to update the upper bound; use MinTransDist for pruning

New TNN Algorithms - Hybrid

Slide20

Example for pruning:

New TNN Algorithms - Hybrid

Slide21

Goal: reduce energy consumptionAnalysis:Previous algorithms minimize the search range in the Estimate Step by issuing “exact” searchEnergy consumption in Filter Step is lowEnergy consumption in Estimate Step is highApproach: use “approximate” search in Estimate Step to save energy in this step

Optimization

Slide22

Approximate Search:Relax the pruning conditionUse ratio of overlapping area to estimate the probabilityCompare the ratio with a threshold α

Optimization

Slide23

How to determine α? factors:R-tree height and node depthUse small α on the root and large α on leavesDifference in densities of the two datasets involvedSmall α or 0 on the dataset with smaller density

Optimization

α

0

1

exact search

approximate search

Slide24

Dataset 1:39,000 * 39,000 square regionDensities: 10-7.0, 10-6.6, 10-6.2, 10-5.8, 10-5.4, 10-5.0, 10-4.6, 10-4.2# of points: 152, 382, 960, 2411, 6055, 15210, 38206, 95969Dataset 2:39,000 * 39,000 square region# of points: 2,000 – 30,000 with 2,000 increment

Performance Evaluation - settings

Slide25

R-tree as air indexBroadcast in depth-first orderSTR packing algorithm [3](1, m) interleaving [2]1,000 query points generated for each of the experiments

Performance Evaluation - settings

ParameterSizeIndex pointer2 bytesCoordinate4 bytesData content1k bytesPage capacity64 – 512 bytes

[3] S.Leutenegger, M.Lopez and J.Edginton. Str: a simple and efficient algorithm for r-tree packing.

ICDE

1997

[2] T.Imielinski, S.Viswanathan, and B.Badrinath. Data on air: organization and access.

TKDE

1997

Slide26

Algorithms with exact search:Access time: Double-NN and Hybrid-NN have the same access time, which is smaller than Window-Based1.8 ≥ size(S) / size(R) ≥ 1 / 40

Performance Evaluation

Slide27

Algorithms with exact search:Tune-in time: when 0.01 ≤ size(S)/size(R) ≤ 0.4 Hybrid-NN gives the best tune-in time

Performance Evaluation

Slide28

ANN vs. eNNImprovement in tune-in time ranges from 11%-20%

Performance Evaluation

Slide29

Hybrid algorithm with ANN:

Performance Evaluation

Slide30

Double-NN and Hybrid-NN effectively reduce access timeCases in which our algorithms reduces tune-in time are stated and discussedOptimization technique effectively reduces tune-in time of all three algorithms

Conclusions

Slide31

Generalized TNN queries in broadcast environment:More than 2 datasets are involvedVisiting order not specifiedComplete route queryUsing new distance metrics in disk based environment

Future Work

Slide32

Any questions?

Thank you!

Slide33

Def 1: (MinTransDist)Given two points p and r, and an MBR MS, MinTransDist(p, MS ,r) finds a point s on MS such that MinTransDist(p, MS ,r)=dis(p, s)+dis(s, r) and for any point s’≠ s, s’ ∈MS dis(p, s’)+dis(s’, r) ≥ MinTransDist(p, MS ,r)

New TNN Algorithms – distance metrics (backup slides)

Slide34

Def 2: (MaxDist)Given two points p and r, and a line segment ℓ, MaxDist(p, ℓ, r) = maxi=I,2 {dis(p, vi)+dis(vi, r), where vi, (i=1, 2) are the two end points of ℓMaxDist(p, ℓ, r) gives a tight upper bound for all the transitive distances from p to any points on ℓ, to r.

New TNN Algorithms – distance metrics (backup slides)

p

r

Slide35

Def 3: (MinMaxTransDist)Given two points p and r, and an MBR MS, MinMaxTransDist(p, MS, r) = min1≤i≤4{ MaxDist(p,ℓi, r ) } where ℓi (1≤i≤4) are the four sides of MBR MSLemma:Given a starting point p, an ending point r, and an MBR MS enclosing a point dataset S, ∃s ∈ S, such that dis(p, s)+dis(s, r) ≤ MinMaxTransDist(p, MS, r)

New TNN Algorithms – distance metrics (backup slides)


About DocSlides
DocSlides allows users to easily upload and share presentations, PDF documents, and images.Share your documents with the world , watch,share and upload any time you want. How can you benefit from using DocSlides? DocSlides consists documents from individuals and organizations on topics ranging from technology and business to travel, health, and education. Find and search for what interests you, and learn from people and more. You can also download DocSlides to read or reference later.