/
Improving Search for Emerging Applications Improving Search for Emerging Applications

Improving Search for Emerging Applications - PowerPoint Presentation

phoebe-click
phoebe-click . @phoebe-click
Follow
383 views
Uploaded On 2015-11-22

Improving Search for Emerging Applications - PPT Presentation

Some techniques current being licensed to Bimaple Chen Li UC Irvine Overview of my UC Irvine Research Text research main focus of this talk Dataintensive computing ASTERIX project Vertical Search ID: 202072

query search uci prefix search query prefix uci data

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Improving Search for Emerging Applicatio..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Improving Search for Emerging Applications

* Some techniques current being licensed to Bimaple

Chen Li UC IrvineSlide2

Overview of my UC Irvine ResearchText research (main focus of this talk)Data-intensive computing: ASTERIX projectSlide3

Vertical SearchSearch on a specific segment of online contentDifferent from general Web search engineSlide4

Approach 1: Database built-in full-text searchExample (Oracle)

SELECT SCORE(1) score, id, name FROM Products WHERE CONTAINS(doc, ‘

iphone', 1) > 0 ORDER BY SCORE(1) DESC;LimitationsSpeedRankingSlide5

Approach 2: Open-source packagesSlide6

Approach 3: Home-grown search enginesSlide7

Recent Trend1: More Mobile ApplicationsFat fingers …Slide8

Recent Trend 2: More Location-Based ServicesSlide9

So: New requirements for Vertical Search

Find results faster

 Instant search

Deal with errors

 Fuzzy search

Be aware of the location  Location-based SearchSlide10

Demos

http://psearch.ics.uci.edu

: Search on UCI directory;

http://ipubmed.ics.uci.edu

: Search on more than 21 million MEDLINE publications

http://www.omniplaces.com/

: Location-based search on 17 million geospatial objects.Slide11

11Search on People Directories

psearch.ics.uci.eduSlide12

12Search on Publications

ipubmed.ics.uci.eduSlide13

13Search on Business Listings

www.omniplaces.comSlide14

Our Focus: Instant Search in Vertical DomainsServer applications (enterprises)

E.g., e-commerce systemsPowerful featuresEfficientFull textFuzzy searchLocation-based search…Slide15

“Instant Search”Search as you typeType-ahead searchAutocompletion…

Benefits:Save user timeSuggestionsSave 2-3 seconds (Google Instant)Mobile devicesSlide16

Google InstantSlide17

Instant Search ClassificationQuery PredictionExample: Google InstantRely on query logs and user profiles“Fire” the most likely prediction

Searching directly on the dataExample: PSearch@UCINot relying on query logsSlide18

ChallengesPerformance< 100 msserver processing, network, javascript

, etcRequirement for high query throughput20 queries per second (QPS)  50ms/query (at most)100 QPS 

10ms/queryOther challenges: RankingSpace requirements…Slide19

Next: two featuresFuzzy Search: finding results with approximate keywords

Full-text: find results with query keywords (not necessarily adjacently)Slide20

Edit Distance

Ed(s1, s2) = minimum # of operations (insertion, deletion, substitution) to change s1 to s2

s1: v

e n k a t s u b r a m a n i a n

s2:

w

e n k a t s u b r a m a n i a n

ed(s1, s2) = 1Slide21

Problem SettingDataR: a set of recordsW: a set of distinct wordsQuery

Q = {p1, p2, …, pl}: a set of prefixesδ: Edit-distance threshold

Query resultRQ: a set of records such that each record has all query prefixes or their similar formsSlide22

Feature 1: Fuzzy SearchSlide23

FormulationRecord Strings

wenkatsubra

Find strings with a prefix similar to a query keywordDo it incrementally!venkatasubramanian

carey

jain

nicolau

smith

Query:Slide24

u n i v e r s a l

2-grams

Fuzzy search using grams

Find elements whose occurrences ≥ T

Ascending

order

MergeSlide25

The Flamingo Packagehttp://flamingo.ics.uci.edu/Slide26

ObservationStrings = {exam, example, exemplar, exempt, sample}

Edit-distance threshold δ = 2

PrefixDistanceexam2examp1exampl0example

1exemp2

exempt

2

exempl

1

exempla

2

sampl

2

Prefix

Distance

examp

2

exampl

1

example

0

exempl

2

exempl

a

2

sample

2

delete

e

delete

e

match

e

delete

e

replace

e

with

a

match

e

Q’ =

exampl

Q =

exampl

eSlide27

Trie Indexing Computing set of active nodes

ΦQInitializationIncremental step

ex

a

m

p

l

$

$

e

m

p

l

a

r

$

t

$

s

a

m

p

l

e

$

Prefix

Distance

examp

2

exampl

1

example

0

exempl

2

exempla

2

sample

2

Active nodes for Q =

example

e

2

1

0

2

2

2Slide28

InitializationQ = ε

e

xa

m

p

l

$

$

e

m

p

l

a

r

$

t

$

s

a

m

p

l

e

$

Prefix

Distance

0

1

1

2

2

Prefix

Distance

0

e

1

ex

2

s

1

sa

2

Prefix

Distance

ε

0

Initializing

Φ

ε

with all nodes within a depth of

δ

eSlide29

Incremental Algorithm: Overview

Access their leaf nodes as answers.Slide30

Feature 2: Full-text searchFind answers with query keywordsNot necessarily adjacentlySlide31

Multi-Prefix IntersectionQ = vldb li

IDRecord1Li data…

2data…3data Lin…4Lu Lin Luis…5Liu…6VLDB Lin data…7

VLDB…8

Li VLDB…

6

VLDB

Li

n data…

8

Li

VLDB

d

a

t

a

$

l

i

n

u

$

u

$

v

l

d

b

$

1

2

3

6

5

4

6

7

8

$

3

4

6

i

s

$

1

8

$

4Slide32

Multi-Prefix Intersection: Method 1

IDRecord1Li data…2data…

3data Lin…4Lu Lin Luis…5Liu…6VLDB Lin data…7VLDB…8

Li VLDB…

d

a

t

a

$

l

i

n

u

$

u

$

v

l

d

b

$

1

2

3

6

5

4

6

7

8

$

3

4

6

i

s

$

1

8

$

4

1 3 4 5 6 8

6 7 8

li

vldb

6 8

Q =

vldb li

Space cost

Inverted

index

Time cost

Union + intersection

More efficient intersection approaches…Slide33

Multi-Prefix Intersection: Method 2

Forward List 1 2 1 1 3

3 5 6 4 1 3 7 7 2 7

d

a

t

a

$

l

i

n

u

$

u

$

v

l

d

b

$

1

2

3

6

5

4

6

7

8

$

3

4

6

i

s

$

1

8

$

4

ID

Record

1

Li data…

2

data…

3

data Lin…

4

Lu Lin Luis…

5

Liu…

6

VLDB Lin data…

7

VLDB…

8

Li VLDB…

[1, 7]

[1, 1]

[1, 1]

[1, 1]

[1, 1]

[2, 6]

[2, 4]

1

2

3

4

5

6

7

[3, 3]

[4, 4]

[5, 6]

[6, 6]

[6, 6]

[7, 7]

[7, 7]

[7, 7]

[7, 7]

Q =

vldb li

6

7

8

[2, 4]

Read each

Verify/Probe

6

VLDB

Li

n data…

1

3

7

8

Li

VLDB

2

7

Space cost

Inverted

+ forward index

Time cost

Probing forward listsSlide34

Experimental ResultsComputing similar prefixesSlide35

Multi-prefix intersectionSlide36

Time ScalabilitySlide37

Index scalabilitySlide38

Research on data-intensive computinghttp://asterix.ics.uci.eduhttp://cherry.ics.uci.edu/Slide39

39Thank you!