Generating Query Substitutions

Generating Query Substitutions Generating Query Substitutions - Start

2016-07-22 51K 51 0 0

Description

Alicia Wood. What is the . problem. to be solved?. Problem. I. mperfect description of need. Search engine not able to retrieve documents matching query . N. eed accurate and related query substitutions. ID: 414891 Download Presentation

Embed code:
Download Presentation

Generating Query Substitutions




Download Presentation - The PPT/PDF document "Generating Query Substitutions" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.



Presentations text content in Generating Query Substitutions

Slide1

Generating Query Substitutions

Alicia Wood

Slide2

What is the

problem

to be solved?

Slide3

Problem

Imperfect description of needSearch engine not able to retrieve documents matching query Need accurate and related query substitutions

Slide4

Problem (cont.)

Given a query

W

ant to generate modified query (related)

Improvements (specification)

Neutral (spelling change, synonym)

Loss of original meaning (generalization)

Slide5

Who

cares about this problem and why?

Slide6

Who cares?

User typing the queryWant correct results with imperfect query

Slide7

What have

others done

to solve this problem and why is this

inadequate

?

Slide8

Previous Work

Relevance/Pseudo relevance feedbackQuery term deletionSubstituting query terms with related termsLatent Semantic Indexing (LSI)

Slide9

Relevance/Pseudo relevance feedback

Submit query

for

initial retrieval

P

rocessing resulting documents

Modify

the query by expanding

with

additional terms from

documents

Perform

second retrieval with

modified query

Can

cause

query drift

Computationally

expensive

Slide10

Query term deletion

Loss of specificity from original query

Slide11

Substituting query terms

Relies on an initial retrieval

Slide12

Latent Semantic Indexing (LSI)

Identify patterns in relationships between terms and concepts in unstructured collection of text

Computationally

expensive

Slide13

What is the proposed solution to the problem?

Slide14

Solution

Query modification based on pre-computed query and phrase similarity,

Ranking proposed queries

Similar queries /phrases derived from user query sessions

Learned models used to re-rank

B

ased on similarity of new query to original query

Slide15

Contributions

I

dentification of new source of data to identify similar queries and phrases

T

he definition of a scheme for scoring query suggestions

A

n algorithm to combine query and phrase suggestions

Finds highly and broadly relevant phrases

Identification of features that are predictive of highly relevant query suggestions

Slide16

Classes of Suggestion Relevance

Precise rewriting

M

atch user’s intent, preserve core meaning

automobile insurance <-> automotive insurance

Approximate rewriting

direct close relationship to topic, scope narrowed or broadened

Apple music player <->

ipod

shuffle

Possible rewriting

Categorical relationship to initial query, complementary product but distinct

Eye glasses <-> contact lenses

Clear mismatch – no clear relationship

Jaguar xj6 <->

os

x jaguar

Slide17

Classes of Rewriting

Specific Rewriting (1+2)

closely related query

highly relevant

Broad Rewriting (1+2+3)

q

uery expansion

r

elevant to user interests

Slide18

Substitutables

Initial query -> generate relevant queries

Replace query as whole or phrases

Segment query into phrases

Find query pairs where one segment has changed

(

britney

spears) (mp3s) -> (

britney

spears) (lyrics)

Pair Independence Hypothesis Likelihood Ratio

High value = strong dependence between two terms

Slide19

Validation

1000 initial queriesGenerate single suggestion (qj) for eachEvaluate accuracy of approaches Train machine learned classifier Evaluate ability to produce higher quality suggestionsWord distance, normalized edit distance, number of substitutions Suggestions criteria:Some words from initial queryModifications shouldn’t be made at start of query

Slide20

Future Work

Build semantic classifierPredict semantic class of rewritingTake inspiration from machine translation techniques Introduce language modelAvoid producing nonsensical queries

Slide21

Slide22

Slide23

Slide24


About DocSlides
DocSlides allows users to easily upload and share presentations, PDF documents, and images.Share your documents with the world , watch,share and upload any time you want. How can you benefit from using DocSlides? DocSlides consists documents from individuals and organizations on topics ranging from technology and business to travel, health, and education. Find and search for what interests you, and learn from people and more. You can also download DocSlides to read or reference later.