/
D 3 : Passage Retrieval D 3 : Passage Retrieval

D 3 : Passage Retrieval - PowerPoint Presentation

briana-ranney
briana-ranney . @briana-ranney
Follow
389 views
Uploaded On 2017-09-08

D 3 : Passage Retrieval - PPT Presentation

Group 3 Chad Mills Esad Suskic Wee Teck Tan Outline System and Data Document Retrieval Passage Retrieval Results Conclusion System and Data Development Testing TREC 2004 TREC 2004 TREC 2005 ID: 586405

passage retrieval indri improvement retrieval passage improvement indri document top attempted ranking settings target question system good data 307

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "D 3 : Passage Retrieval" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

D3: Passage Retrieval

Group 3

Chad Mills

Esad Suskic

Wee Teck TanSlide2

Outline

System and Data

Document Retrieval

Passage Retrieval

Results

ConclusionSlide3

System and Data

Development

Testing

TREC 2004

TREC 2004

TREC 2005

System: Indri

http://www.lemurproject.org

/

Data:Slide4

Document Retrieval

Baseline:

Remove “?”

Add Target String

MAP: 0.307 Slide5

Document Retrieval

Attempted Improvement 1:

Settings

F

rom Baseline

Rewrite “When was…” questions as “[target] was [last word] on” queriesMAP: 0.301Best so far: 0.307Slide6

Document Retrieval

Attempted Improvement 2:

Settings

F

rom Baseline

Remove “Wh” wordsRemove Stop WordsReplaced Pronoun with Target StringMAP: 0.319

Best so far: 0.307

Wh

” / Stop Words

What, Who, Where, Why, How many, How often, How long, Which, How did, Does, is, the, a, an, of, was, as

Pronoun

he, she, it, its, they, their, hisSlide7

Document Retrieval

Attempted Improvement 3:

Settings

F

rom Improvement 2

Index Stemmed (Krovetz Stemmer)MAP: 0.336Best so far: 0.319Slide8

Document Retrieval

Attempted Improvement 4:

Settings

F

rom Improvement 3

Remove Punctuations Remove Non Alphanumeric CharactersMAP: 0.374Best so far: 0.336Slide9

Document Retrieval

Attempted Improvement 5:

Settings

F

rom Improvement 4

Remove Duplicate WordsMAP: 0.377Best so far: 0.374Slide10

Passage Retrieval

Baseline:

Out-of-the-box Indri

Same Question Formulation

Changed “#combine(“ to “#combine[

passageX:Y](”Passage Window, Top 20, No Re-rankingX=40Y=20

Strict

0.126

Lenient

0.337

X=200

Y=100

Strict

0.414

Lenient

0.537Slide11

Passage Retrieval

Attempted Re-ranking

Mallet

MaxEnt

Classifier

Training Set TREC 200480% Train : 20% DevSplit by TargetAvoid Cheatinge.g. Question 1.* all in either Train or DevLabels:+ Passage has Correct Answer- Passage doesn’t have AnswerSlide12

Passage Retrieval

Features used:

For both

Passage

and

Question+Target:unigram, bigram, trigramPOS tags – unigram, bigram, trigramQuestion/Passage Correspondence:# of Overlapping Terms (and bigrams)Distance between Overlapping TermsTried Top 20 Passages from Indri, and Expanding to

Top

200

PassagesSlide13

Passage Retrieval

Result:

all attempts

were worse than before

Example confusion matrix:

Many negative examples, 67-69% accurate on all feature combinations tried

+

-

+

16

267

-

37

620Slide14

Indri was very good to start withE.g. Q10.1

Passage Re-Ranking

Indri Rank

Has Answer

1

Yes

2

No

3

Yes

4

Yes

5

No

Our Rank

Has Answer

P(Yes)

P(No)

Indri Rank

1

No

0.076

0.924

5

2

No

0.027

0.973

18

3

Yes

0.014

0.986

8

4

No

0.011

0.989

75Yes0.0070.99314

Our first 2 were wrong, only 1 of Indri’s top 5 in our top 5

If completely replacing rank, must be very good

Many low confidence scores (e.g. 7.6% P(Yes) was best)

Slight edit to Indri ranking less bad, but no good system found

E.g. bump high-confidence Yes to top of list, leave others in Indri orderSlide15

Results

TREC 2004:

TREC 2005:

MAP

0.377

Strict MRR

0.414

Lenient MRR

0.537

MAP

0.316

Strict MRR

0.366

Lenient MRR

0.543Slide16

References

Fang – “A

Re-examination of Query Expansion Using Lexical

Resources”

Tellex

– “Quantitative Evaluation of Passage Retrieval Algorithms for Question Answering”Slide17

Conclusions

Cleaned Input

Small Targeted Stop Word List

Minimal Setting

Indri Performs PR Well OOTB

Re-ranking Implementation Needs to be Really GoodFeature Selection didn’t HelpSlight Adjustment Instead of Whole Different Ranking Might Help