# Processing Transitive Nearest-Neighbor Queries in Multi-Cha

### Presentations text content in Processing Transitive Nearest-Neighbor Queries in Multi-Cha

Processing Transitive Nearest-Neighbor Queries in Multi-Channel Access Environments

Xiao Zhang1, Wang-Chien Lee1, Prasenjit Mitra1, 2, Baihua Zheng31 Department of Computer Science and Engineering2 College of Information Science and TechnologyThe Pennsylvania State University3 School of Information Systems, Singapore Management University

EDBT, Nantes, France, 03/28/2008

Slide2BackgroundProblem AnalysisNew TNN AlgorithmsOptimizationExperimentsConclusions & Future Work

Outline

Slide3What is TNN?S is a set of banksR is a set of restaurantsTNN distance = 5+1 = 6

Background – TNN

Slide4What is TNN?Given a query point p and two datasets S and R, TNN returns a pair of objects (s, r) such that ∀(s’, r’)∈S×R, dis(p, s) + dis(s, r) ≤ dis(p, s’) + dis(s’, r’) where dis(p,s) is the Euclidean distance between p and s.

Background – TNN

First proposed by Zheng, Lee and Lee [1].

[1] B. Zheng,

K.C.Lee

and

W.-C.Lee

. Transitive nearest neighbor search in mobile environments.

SUTC

2006

Slide5Server has all the data and broadcasts data in forms of radio signals in channels.Mobile clients (cell phones and PDAs) tune in to broadcast channels, download necessary data and process queries.

Background - broadcast

Broadcast VS. on-demandSupport an arbitrary number of mobile devices to have simultaneous accessEfficient use of limited bandwidthLight workload on the server side

Slide6Assumption:Zheng, Lee and Lee assumed a single broadcast channel.Based on existing technology (dual-mode, dual-standby cell phone), we assume multiple channels.A mobile client can access information in multiple channels simultaneouslyChallenges:How to utilize the parallel processing ability of mobile clients to facilitate query processing?How to reduce access time?How to reduce energy consumption?

Background - motivation

Slide71. We developed two new algorithms for TNN query in multi-channel access environment.2. We proposed two new distance metrics (MinTransDist and MinMaxTransDist) so that our new algorithms efficiently reduce search cost.3. We proposed an optimization technique to reduce energy consumption.

Our contributions:

Slide81. Two broadcast channels, for S and R2. 2-dim points3. Air-indexing: R-tree[2]4. Broadcast in depth-first order, in order to avoid back-tracking5. (1, m) interleaving [3]6. performance metrics (in # of pages): Access timeTune-in time

Background – settings

[2] A.

Guttman

. R-trees: a dynamic index structure for spatial searching. in

Sigmod

’84

[3]

T.Imielinski

,

S.Viswanathan

, and

B.Badrinath

. Data on air: organization and access.

TKDE

1997

Slide9Problem Analysis

Randomly choose

ANY

pair of objects (

s’, r’

), use the trans. dist. as a search range

Guarantee to enclose the answer pair (

s, r

)

Slide10Theorem[1]: the transitive distance determined by any pair of objects (s, r) is an upper bound.General ideas of answering TNN queries:Estimate: find a search range from the query point p by searching the indexFilter: filter unqualified data objects in the search range determined earlier to find the pair of objects with minimum transitive distance.

Problem Analysis

Slide11Deficiencies of existing algorithms:Approximate-TNN-Search:Uses an equation to estimate the search range in the first stepSearch range may be too large or too smallWindow-Based-TNN-Search:Two sequential NN searches in estimation stepSearch range estimation is done in sequential orderLarge access time

Problem Analysis

Slide12Algo 1: Double-NN-SearchIssue two NN queries in estimation stepp’s NN in S, and p’s NN in R(s1, r2)

New TNN algorithms – algo1

Slide13Hybrid-NN-SearchIncreases interaction between two channelsUses result of the finished NN to guide the unfinished NN in order to reduce search rangeUses new distance metrics to perform branch-and-boundTreat TNN distance as a whole

New TNN Algorithms – algo2

Slide14NN in Channel 1 finishes firstAlready found s=p.NN(S)Looking for r2, instead of r1

New TNN Algorithms – algo 2

Slide15NN in channel 2 finishes firstAlready found r=p.NN(R)Looking for s2 instead of s1Use new criteria when searching the indexNeed new distance metrics for branch&bound

New TNN Algorithms – algo 2

Slide16MinTransDist: Lower bound for trans. dist. from p to an MBR to r.MinMaxTransDist:Upper bound for trans. dist. from p to an MBR to r.Details given in the paper.

New TNN Algorithms –

algo

2

Slide17Algorithm description:If the two NN searches in both channels are not finished, follow the Double-NN algorithmIf the NN search in Channel 1 (Dataset S) finishes first, let s=p.NN(S), use s as the new query point and perform NN on the remaining portion of R-tree for dataset R. If the NN search in Channel 2 (Dataset R) finishes first, change distance metrics, use MinTransDist and MinMaxTransDist to perform branch-and-bound. Find an s which can minimize the transitive distance.

New TNN Algorithms -

Hybird

Slide18Updating and pruning strategyUse queue to keep potential MBRs, sorted based on their arrival timeCase 2 (s=p.NN(S) finishes first):Switch NN query point to the sInitial upper bound updateIf there is an intermediate result r’, update the upper bound with dis(p, s)+dis(s, r’ )Scan the queue of MBRs and use dist. metr. in traditional NN queries.

New TNN Algorithms - Hybrid

Slide19Updating and pruning strategy (cont.)Case 3 (r=p.NN(R) finishes first):If there is an intermediate result s’, use dis(p, s’)+dis(s’, r) as the new upper boundThen scan all the MBRs in the queue, use z=minMi∈MBR_queue{MinMaxTransDist(p, Mi, r)} to update the upper bound.In traversal, use MinMaxTransDist to update the upper bound; use MinTransDist for pruning

New TNN Algorithms - Hybrid

Slide20Example for pruning:

New TNN Algorithms - Hybrid

Slide21Goal: reduce energy consumptionAnalysis:Previous algorithms minimize the search range in the Estimate Step by issuing “exact” searchEnergy consumption in Filter Step is lowEnergy consumption in Estimate Step is highApproach: use “approximate” search in Estimate Step to save energy in this step

Optimization

Slide22Approximate Search:Relax the pruning conditionUse ratio of overlapping area to estimate the probabilityCompare the ratio with a threshold α

Optimization

Slide23How to determine α？ factors:R-tree height and node depthUse small α on the root and large α on leavesDifference in densities of the two datasets involvedSmall α or 0 on the dataset with smaller density

Optimization

α

0

1

exact search

approximate search

Slide24Dataset 1:39,000 * 39,000 square regionDensities: 10-7.0, 10-6.6, 10-6.2, 10-5.8, 10-5.4, 10-5.0, 10-4.6, 10-4.2# of points: 152, 382, 960, 2411, 6055, 15210, 38206, 95969Dataset 2:39,000 * 39,000 square region# of points: 2,000 – 30,000 with 2,000 increment

Performance Evaluation - settings

Slide25R-tree as air indexBroadcast in depth-first orderSTR packing algorithm [3](1, m) interleaving [2]1,000 query points generated for each of the experiments

Performance Evaluation - settings

ParameterSizeIndex pointer2 bytesCoordinate4 bytesData content1k bytesPage capacity64 – 512 bytes

[3] S.Leutenegger, M.Lopez and J.Edginton. Str: a simple and efficient algorithm for r-tree packing.

ICDE

1997

[2] T.Imielinski, S.Viswanathan, and B.Badrinath. Data on air: organization and access.

TKDE

1997

Slide26Algorithms with exact search:Access time: Double-NN and Hybrid-NN have the same access time, which is smaller than Window-Based1.8 ≥ size(S) / size(R) ≥ 1 / 40

Performance Evaluation

Slide27Algorithms with exact search:Tune-in time: when 0.01 ≤ size(S)/size(R) ≤ 0.4 Hybrid-NN gives the best tune-in time

Performance Evaluation

Slide28ANN vs. eNNImprovement in tune-in time ranges from 11%-20%

Performance Evaluation

Slide29Hybrid algorithm with ANN:

Performance Evaluation

Slide30Double-NN and Hybrid-NN effectively reduce access timeCases in which our algorithms reduces tune-in time are stated and discussedOptimization technique effectively reduces tune-in time of all three algorithms

Conclusions

Slide31Generalized TNN queries in broadcast environment:More than 2 datasets are involvedVisiting order not specifiedComplete route queryUsing new distance metrics in disk based environment

Future Work

Slide32Any questions?

Thank you!

Slide33Def 1: (MinTransDist)Given two points p and r, and an MBR MS, MinTransDist(p, MS ,r) finds a point s on MS such that MinTransDist(p, MS ,r)=dis(p, s)+dis(s, r) and for any point s’≠ s, s’ ∈MS dis(p, s’)+dis(s’, r) ≥ MinTransDist(p, MS ,r)

New TNN Algorithms – distance metrics (backup slides)

Slide34Def 2: (MaxDist)Given two points p and r, and a line segment ℓ, MaxDist(p, ℓ, r) = maxi=I,2 {dis(p, vi)+dis(vi, r), where vi, (i=1, 2) are the two end points of ℓMaxDist(p, ℓ, r) gives a tight upper bound for all the transitive distances from p to any points on ℓ, to r.

New TNN Algorithms – distance metrics (backup slides)

p

r

ℓ

Slide35Def 3: (MinMaxTransDist)Given two points p and r, and an MBR MS, MinMaxTransDist(p, MS, r) = min1≤i≤4{ MaxDist(p,ℓi, r ) } where ℓi (1≤i≤4) are the four sides of MBR MSLemma:Given a starting point p, an ending point r, and an MBR MS enclosing a point dataset S, ∃s ∈ S, such that dis(p, s)+dis(s, r) ≤ MinMaxTransDist(p, MS, r)

New TNN Algorithms – distance metrics (backup slides)

## Processing Transitive Nearest-Neighbor Queries in Multi-Cha

Download Presentation - The PPT/PDF document "Processing Transitive Nearest-Neighbor Q..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.