/
DAISY DAISY

DAISY - PowerPoint Presentation

olivia-moreira
olivia-moreira . @olivia-moreira
Follow
393 views
Uploaded On 2016-03-14

DAISY - PPT Presentation

Dutch lAnguage Investigation of Summarization technologY Katholieke Universiteit Leuven Rijksuniversiteit Groningen Qgo DAISY on one slide Segmentation Rhetorical classification ID: 255145

corpus segmentation generation sentence segmentation corpus sentence generation web questions fluent progress preprocessing feature texts svb building selection alpino

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "DAISY" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

DAISYDutch lAnguage Investigationof Summarization technologY

Katholieke

Universiteit

Leuven

Rijksuniversiteit

Groningen

Q-goSlide2

DAISY on one slide

Segmentation

Rhetorical

classificationSentencecompressionSentencegeneration

Multi-document summarization:

Detect differences

Improvement question answering,

e.g. e-mail answering

Summarization of web contentSlide3

OverviewReport of our current progress in:Corpus building and preprocessingSegmentation

Sentence generationSlide4

Corpus Building and PreprocessingTarget: corpus of questions, short texts and

webpages

about the same topic

Freely available: UWV (questions & answer texts)SVB (questions)Available for internal use: KLM (questions, answer texts, web pages)Todo: web pages SVB

ABN AMRO (committed, not delivered)Slide5

Corpus Building and PreprocessingPOS-tagged and parsed: KLM and UWVSVB corpus: in progress

Coreference

resolution: in progressSlide6

Segmentation

Find main content in webpage

Smaller segments

Can be obtained from HTML structure

<H#>, <P>, <BR>, <UL>, ...

Hierarchical

Will be refined in relation to

rhetorical

rolesSlide7

SegmentationSlide8

SegmentationSlide9

Segmentation

Search for block with highest density of textSlide10

SegmentationSlide11

Segmentation

Additional heuristics to extend the selection:

Find closing tags for all tags that were opened in the selection

Include all text delimited by known tag patterns occurring just before and after the selection

Take the smallest enclosing DIV blockSlide12

Sentence generationSpecification of abstract dependency treesSpecify grammatical relations between lexical items and constituents dominating over lexical items

Alpino

dependency trees without adjacency information

More variation through underspecification in lexical items, handling of particlesSlide13

Sentence generationInitial implementation generator:Chart generator (Kay, 1996)Top-down guidance through expected dependency relations

Generates substantial part of input created from the

Alpino

testsuitesIncluded in recent Alpino versionsFurther work: optimization (time and space)Slide14

Sentence generationSelecting the most fluent sentence through fluency ranking:N-gram language model

Log-linear model

Experiments with

Velldall (2007) and parse disambiguation feature templates.Need more insight about feature overlapExperiment with more feature templatesSlide15

Sentence generationEvaluation:Corpus sentences used as a reference for the most fluent realization

Fairly strict, since there can be multiple fluent sentences

Where is the ceiling?

More annotated material!FLAN: FLuency ANnotator (web application) Slide16

Thanks!