Advanced Query Processing - PowerPoint Presentation

395 views
Uploaded On 2018-03-21

Advanced Query Processing - PPT Presentation

Dr Susan Gauch Query Term Weights The vector space model matches queries to documents with the inner productcosine similarity measure Query vector Document vector inner product Normalizedqvector ID: 659569

terms query weights term query terms term weights document documents vector required num weight results nwt excluded add dog

Link:

Copy

Embed:

<iframe width="560" height="315" src="https://www.docslides.com/embed/659569" frameborder="0" allowfullscreen></iframe>

Download Presentation from below link

Download Presentation The PPT/PDF document "Advanced Query Processing" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation Transcript

Slide1

Advanced Query Processing

Dr. Susan GauchSlide2

Query Term Weights

The vector space model matches queries to documents with the inner product/cosine similarity measure

Query vector * Document vector (inner product)

Normalized_q_vector



Normalized_doc_vector

Sum over all terms (

t in vector space)

nwt_q

nwt_d

We implement this with:

For all terms

with non-zero query weight

For all documents j that contain term

Sum (

nwt_d

)Slide3

Query term weights

Where did the query term weights go?

Essentially, we assume that all query terms are weighted “1”

If a term occurs twice in a query

E.g., “dog cat dog”

Process “dog” twice, add the postings for “dog” twice, so we effectively have a

q_wt

of 2 for “dog”

Can do this more efficiently by preprocessing query using a…. HASHTABLE! To count the term frequencies in the query

Dog (2) cat (1)Slide4

Query Term Weights – Simple Implementation

Can do this more efficiently by preprocessing query using a…. HASHTABLE! To count the term frequencies in the query

Dog (2) cat (1)

For all terms

with non-zero query weight

For all documents j that contain term

Sum (

q_wt

nwt_d

)Slide5

Query Term Weights – Proper Implementation

Can change query syntax to allow users to specify weights:

Dog (2) Cat (1)

Dog 0.7 Cat 0.3

Need better query parsing

Can tie to interfaces (sliders)

Users poor at selecting weights and often get worse retrieval not better, so infrequently implementedSlide6

Query Term Weights – Document Similarity

Where are query term weights actually used?

When trying to locate “similar” documents

Consider: how do you find the most similar documents to document

Applications: plagiarism detection, document clustering/classification (unsupervised/supervised learning)

Simple implementation:

Treat

as a query

Top results are most similar documentsSlide7

Document Similarity

For all terms

with non-zero

weight in

For all documents j that contain term

Sum

(

nwt_d

)

What is weight

idf

of terms in

We would need to store this

Or, start with document and calculate on the fly using stored

idf

dict

file

Efficiency

Linear in number of terms

Very slow for long documents

Calculate

idf

for all terms in document k

Sort and use top n weighted terms (n ~ 10 .. 50)Slide8

~Boolean Queries

Vector space model merely sums the weights of the query terms in each document

Top document may not have all query terms in it

How implement quasi-Boolean retrieval

“+canine feline –teeth”

Results must have “canine”, may have “feline”, must not have “teeth”

Need to expand accumulator buckets to keep track of number of required terms contributing to the weights and number of excluded termsSlide9

~Boolean QueriesAccumulator:

Total

Num

-Required

Num

-Excluded

For regular (no + or -)

Just add to Total (nothing new)

For required terms (+)

Add to total

Add to

Num

-RequiredSlide10
Slide11

~Boolean Queries

For excluded terms (-)

Subtract from total

Add to

Num

-Excluded

Presenting results:

First (only) show results where

Num_required

in Accumulator ==

Num_required

in query &&

Num_excluded

== 0

Sort by weight

Can expand the results shown by later showing groups of results with

High weights, but missing 1 or more required terms

High weight, but including 1 or more excluded terms

Advanced Query Processing - PowerPoint Presentation

Advanced Query Processing - PPT Presentation

Share:

Link:

Embed:

Related Contents