in Large Networks. Minhao Jiang. †. , Ada . Wai. -Chee Fu. ‡. , Raymond Chi-Wing Wong. †. † . The Hong Kong University of Science and Technology, . ‡ . The Chinese University of Hong Kong.

Embed :

Presentation Download Link

Download Presentation - The PPT/PDF document "1 Exact Top-k Nearest Keyword Search" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Slide1

1

Exact Top-k Nearest Keyword Search in Large Networks

Minhao Jiang

†

, Ada

Wai

-Chee Fu

‡

, Raymond Chi-Wing Wong

†

†

The Hong Kong University of Science and Technology,

‡

The Chinese University of Hong Kong

Prepared by Minhao Jiang

Presented by Minhao JiangSlide2

2

Motivation

Social network :

In DBLP, who are the researchers that study “database” and are closely related to my supervisor?

Road network:

In Melbourne, where are the nearest cinemas from my hotel showing “3D” movies?Slide3

3

Problem

Given a weighted undirected graph G(V, E), where each vertex contains a set of keywords,

k

-Nearest Keyword Search: k-NK(q,w,k) -- what are the k nearest vertices from vertex q that contain keyword w

?e.g.

k-NK(v2, w0, 3) = {v2, v0, v6}

an undirected graph with unit weighted

edgesSlide4

4

Outline

1. Existing Algorithms

2. Our

Algorithm

3. Experiments4. ConclusionSlide5

5

1.1 Naive Algorithm

Dijkstra

-like search : too slow

k-NK(v2, w0, 3) = {v2, v0, v6}

Optimal SolutionSlide6

6

1.2 Existing

I

ndex-based Algorithms

All existing index-based algorithms

: efficient, but cannot return the optimal solution.

1. PMI algorithm (WWW’ 12) creates the following

index.

k-NK

PMI

(v2, w0, 3)={v2, v1, v0}, which is not correct.

2. pivot algorithm (VLDB’ 13) creates the following index.k-NKpivot(v2, w0, 3)={v2, v6, v1}, which is not correct. k-NK(v2, w0, 3) = {v2, v0, v6} Optimal Solution Slide7

7

2. Our Algorithm

two-hop labeling index

(state-of-the-art distance querying technique

[SODA 02, VLDB 13,14, SIGMOD 12,13]

)+ keyword-aware index (proposed in this paper)the

first index-based exact algorithm which is also efficientSlide8

8

2.1

Background Knowledge:

Two-hop

Labeling Index

1. a label set L(v) : {(vx1, d1), (vx2, d2), (vx3, d3)… }2.

any dist(u,v) = min (d1 + d2), where (v

x

, d1)

∈

L(v) and (ux, d2) ∈ L(u)

e.g.L(v1) = {(v1v0, 1), (v1v1, 0)},L(v6) = {(v6v0, 2), (v6v2, 1), (v6v6, 0)}dist(v1,v6) = 1 + 2 (by a linear scan on each of L(v1) and L(v6))Slide9

9

2.2 Forward Search(FS) Component

Step 1: For each vertex vi containing the query keyword w, we find

dist

(q, vi)

Step 2: Maintain k nearest vi to qEfficient when w is infrequentSlide10

10

2.3 Forward Backward Search(FBS

) Component

(

q

xi, di) ∈ L(q)

(xiq

,

di

)

∈ LB(xi) Step 1: scan (qxi, di

) in L(q) Step 2: for each xi (a). scan (xiyij, dij) in LB(xi) (b). find k shortest (xi yij, dij) such that yij contains w (c). maintain the best-known answersEfficient when w is frequentby KT index

priority

queue

Slide11

11

2.3 KT index

Step 2(b):

for

each

xi,

find

k shortest

(

xi

yij, dij) in LB(xi) such that yij contains w.Naive method: O(|LB(xi)|) : a linear scanKT index: O( klog(|LB(xi)|/k) ) : (1). sort (xi yij

,

dij

) by

dij

, and build a binary tree forest

(2).

index the

keywords

of all

yij

components in all entries

in LB(xi)

by

the hash value (stored in each tree node

)

e.g.

when

LB(xi) =

{(

xi

y0, d0

),

…, (

xi

y12, d12

)}

by KT indexSlide12

12

2.4 FS-FBS Algorithm

C

ombine FS and FBS:

1. If the query keyword is frequent,

- use the FBS method.2. If the query keyword is not frequent, - use the FS method.Slide13

13

2.5 Extension

D

isk-based

setting

Multiple keyword queryDynamic updateSlide14

14

3. Experiments

Datasets: millions of verticesSlide15

15

3.1 Querying Efficiency

PMI: WWW’12

index-based algorithm

pivot-

gs: VLDB’13 index-based algorithmFS-FBS: our exact algorithmDijkstra: naive exact algorithm

HR(hit rate):

% of reported vertices that are

in

the optimal solution

.S-ρ(spearman’s rho): correlation between the

reported ranking and the optimal ranking.Existing index-based algorithms are inaccurateOur exact algorithm is as efficient as existing index-based algorithmsvalue = 1.00 Output is the optimal solutionSlide16

16

3.2 Indexing Cost

Index Size: comparable with existing index-based algorithms

Indexing Time: acceptableSlide17

17

4. Conclusion

Our method can handle k-NK queries in large networks.

We propose the first index-based algorithm returning the optimal solution.

Our method is as efficient as the best-known index-based algorithms

(returning non-optimal answers).Slide18

18

ENDSlide19

19

2.3 Forward

Backward Search(FBS)

Algorithm

How to obtain k shortest (xi

yij, dij) in LB(xi) such that

yij contains w ?

Sort

(xi

yij, dij) by non-ascending

dij in each LB(xi)k shortest (xi yij, dij) with yij containing w are at the end of LB(x)2. Hierarchy: e.g. when LB(x) = {(xy0, d0), …, (xy12, d12)}Project keyword to hash value : e.g. h(w) = 00010000h[8..11] = h(w1) bitwiseOR h(w2) bitwiseOR h(w3)…… where wi is in y8, y9, y10 or y11,if h

[8

..11]

bitwiseAND

h(w) = 0, w is not contained in y8, y9, y10 and y1, we check h

[0

..7], otherwise, we check h

[10

..11

]Slide20

20

2.3 Forward

Backward Search(FBS)

Algorithm

How to obtain k shortest (xi

yij, dij) in LB(xi) such that

yij contains w ?

Sort

(xi

yij,

dij) by non-ascending dijHierarchyStore hierarchy in array:e.g. [8..11] is in a[19] compact storage without loss of efficiency in searchingOne FBS time complexity : where |L| is the size of the 2-hop index, |doc(V)| is the total number of keywords in the graphSlide21

21

2.5 Adapt to Disk-based

S

etting

Keyword w related backward index for each w : LB

LwPartition each Lw

into high index and low index

e

.g. when w is contained in v1, v3 and v4Slide22

22

2

.5

Adapt to Multiple

K

eywords QueryTrivial in FSSame hierarchy in FBS

3. Modify recursive search by

Disjunctive/OR:

if h[8..11]

bitwiseAND

(h(w1) bitwiseOR h(w2) …) = 0, w is not contained in y8, y9, y10 and y1, we check h[0..7], otherwise, we check h[10..11]

Conjunctive/AND: if h[8..11] bitwiseAND (h(w1) bitwiseOR h(w2) …) < h[8..11], w is not contained in y8, y9, y10 and y1, we check h[0..7], otherwise, we check h[10..11]Slide23

23

2

.5

Adapt to Dynamic Update

Keyword Update Trivial in FS

Keyword Update hierarchy in FBS:

When keyword w is inserted into / removed from vertex v, each LB(u) that contains (

u

v

, d) should update its hierarchy by reconstructing the hash value from root to v

Structure Update:3.1 Update 2-hop by existing algorithms

3.2 Update keyword-related information accordingly

in Large Networks Minhao Jiang Ada Wai Chee Fu Raymond ChiWing Wong The Hong Kong University of Science and Technology The Chinese University of Hong Kong ID: 667218 Download Presentation

at the. Earl K. Long Library. What You Will Learn. How to do a/an . Title search . Author search. Keyword search. Understanding what you find. You are probably familiar with the library homepage:. .

Sanjay . Agrawal. Microsoft Research. Surajit. . Chaudhuri. Microsoft Research. Gautham. Das Microsoft Research. Presented by .

Bill Hunt . Back Azimuth Consulting. Workshop Materials. www.whunt.com. /. sesnyc. /. How to use keyword data to…. Get a raise, bonus, promotion or better job!. 17 years experience in Enterprise Search.

Booleans . (and, or, not). . Options in keywords/phrases searches are:. . Title. Subject. Anywhere . (Referring to all search fields). KEYWORD SEARCH . Author. ISBN, ISSN, LCCN. HOW TO FIND BOOK VIA WEBOPAC.

Neighbor. Search with Keywords. Abstract. Conventional spatial queries, such as range search and nearest . neighbor. retrieval, involve only conditions on objects' geometric properties. Today, many modern applications call for novel forms of queries that aim to find objects satisfying both a spatial predicate, and a predicate on their associated texts. For example, instead of considering all the restaurants, a nearest .

ucsdedu Department of Computer Science and Engineering University of California San Diego 9500 Gilman Drive La Jolla CA 92093 Kaushik Sinha kaushiksinhawichitaedu Department of Electrical Engineering and Computer Science Wichita State University 1845

Custom Website Designs That Convert More Visitors Into Customers. Search Engine Optimization. Website Content Creation, Pages, Blogs, White Papers. Website A-B Split Testing. Newsletter Design & Management.

Search and the New Economy. Session . 4. Pay-per-click Advertising. Today’s Objectives. Understand current audience. Get intelligence from referring URL’s. Get intelligence from search keywords. Plan and optimize site for target audience.

Presented By . Amarjit . Datta. 1. Authors and Publication Information. Ning Cao. PhD in ECE from . the Worcester . Polytechnic Institute. Cong Wang. PhD in ECE from Illinois Institute of Technology.

Exact Nearest Neighbor Algorithms Sabermetrics One of the best players ever .310 batting average 3,465 hits 260 home runs 1,311 RBIs 14x All-star 5x World Series winner Who is the next Derek Jeter? Derek Jeter

© 2021 docslides.com Inc.

All rights reserved.