/
Query Chain Focused Summarization Query Chain Focused Summarization

Query Chain Focused Summarization - PowerPoint Presentation

nephewhers
nephewhers . @nephewhers
Follow
342 views
Uploaded On 2020-07-01

Query Chain Focused Summarization - PPT Presentation

Tal Baumel Rafi Cohen Michael Elhadad Jan 2014 Generic Summarization Generic Extractive Multidoc Summarization Given a set of documents Di Identify a set of sentences Sj st Sj ID: 791124

relevant sentences novelty query sentences relevant query novelty task set docs documents summarization concepts information topic chains qcfs key

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Query Chain Focused Summarization" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Query Chain Focused Summarization

Tal

Baumel

, Rafi Cohen, Michael Elhadad

Jan 2014

Slide2

Generic Summarization

Generic Extractive Multi-doc Summarization:

Given a set of documents Di

Identify a set of sentences

Sj

s.t.

|

Sj

| < L

The “central information” in Di is captured by

Sj

Sj

does not contain redundant information

Representative methods:

KLSum

LexRank

Key concepts:

Centrality, Redundancy

Slide3

Update Summarization

Given a set of documents split as A =

ai

/ B =

bj

defined as background / new sets

Select a set of sentences

Sk

s.t.

|

Sk

| < L

Sk

captures central information in B

Sk

does not repeat information conveyed by A

Key concepts: centrality, redundancy,

novelty

Slide4

Query-Focused Summarization

Given a set of documents Di and a query Q

Select a set of sentences

Sj

s.t.

:

|

Sj

| < L

Sj

captures information in Di relevant to Q

Sj

does not contain redundant information

Key concepts:

relevance

, redundancy

Slide5

Query-Chain Focused Summarization

We define a new task to clarify among key concepts:

Relevance

Novelty

Contrast

Similarity

Redundancy

The task is also useful for

Exploratory Search

Slide6

QCFS Task

Given a set of topic-related documents Di and a chain of queries

qj

Output a chain of summaries {

Sjk

}

s.t.

:

|

Sjk

| < L

Sjk

is relevant to

qj

Sjk

does not contain information in

Slk

for l < j

Slide7

Query Chains

Query Chains are observed in query logs:

PubMed search log mining

Extract query chains (length 3) of same session / with related terms (manually)

Query Chains evolution may correspond to:

Zoom in

(asthma

 atopic dermatitis)

Query reformulation

(respiratory problem

 pneumonia)

Focus Change

(asthma

 cancer)

Slide8

Query Chains vs. Novelty Detection

TREC Novelty Detection Task (2005)

Task 1

: Given a set of documents for the topic, identify all relevant and novel sentences.

Task 2

: Given the relevant sentences in all documents, identify all novel sentences.

Task 3

: Given the relevant and novel sentences in the first 5 docs only, find the relevant and novel sentences in the remaining docs.

Task 4

: Given the relevant sentences from all documents and the novel sentences from the first 5 docs, find the novel sentences in the remaining docs.

Slide9

Novelty Detection Task

Create 50 topics:

Compose topic (textual description)

Select 25 relevant docs from News collection

Sort docs chronologically

Mark relevant sentences

Among relevant sentences, mark novel ones (not covered in previous relevant sentences).

28 “events” topics / 22 “opinion” topics

Slide10

TREC Novelty – Dataset Analysis

Select

parts

of documents (not full docs).

Relevant rate: events: 25% / opinion: 15%

Consecutive sentences: 85% / 65%

Relevant agreement: 68% / 50%

Novelty rate: 38% / 42%

Novelty agreement: 45% / 29%

Slide11

TREC Novelty Methods

Relevance = Similarity to Topic.

Novelty = Dissimilarity to past sentences.

Methods:

Tf.idf

and okapi with threshold for retrieval

Topic expansion

Sentences expansion

Named entities as features

Coreference

resolution

Named entities normalization (entity linking)

Results:

High recall / Low precision

Almost no distinction relevant / novel

Slide12

QCFS and Contrast

QCFS is different from Query Focus:

When generating S2 – must take S1 into account.

QCFS is different from Update:

Split A/B is not observed.

QCFS is different from Novelty Detection:

Chronology is not relevant

Key concepts:

Query Relevance

Query Distinctiveness (how qi+1 contrasts with qi)

Slide13

Contrastive IR

CWS: A Comparative Web Search System

Sun et al, WWW 2006

Given 2 queries q1 and q2

Rank a set of “contrastive pairs” (p1, p2)

where p1 and p2 are snippets of relevant docs.

Method:

Retrieve relevant snippets SR1 = {p1i} and SR2 = {p2j}

Score

aR

(p1, q1) +

bR

(p2, q2) +

cT

(p1,p2,q1,q2)

T(p1,p2,q1,q2) = x

Sim

(url1, url2) + (1-x)

Sim

(p1\q1, p2\q2)

Greedy ranking of pairs:

rank all pairs (p1,p2) by score – take top

Remove p1top and p2top from all pairs – iterate.

Cluster pairs into comparative clusters

Extract terms from comparative clusters.

Slide14

Document Clustering

A Hierarchical

Monothetic

Document Clustering Algorithm for Summarization and Browsing Search Results

Kummamuru

et al, WWW 2004

Desirable properties of clustering:

Coverage

Compactness

Sibling distinctiveness

Reach time

Incremental algorithm:

Decide on width n of tree (# children / node)

Nodes are represented by “concepts” (terms)

Rank concepts by score and add them under current node

Score(

Sak

,

cj

) = a

ScoreC

(Sak-1,

cj

) + b

ScoreD

(Sak-1,

cj

)

ScoreC

= document coverage

ScoreD

= sibling distinctiveness