/
Joint Parsing and Alignment with Weakly Synchronized Gramma Joint Parsing and Alignment with Weakly Synchronized Gramma

Joint Parsing and Alignment with Weakly Synchronized Gramma - PowerPoint Presentation

faustina-dinatale
faustina-dinatale . @faustina-dinatale
Follow
380 views
Uploaded On 2016-07-24

Joint Parsing and Alignment with Weakly Synchronized Gramma - PPT Presentation

David Burkett John Blitzer amp Dan Klein TexPoint fonts used in EMF Read the TexPoint manual before you delete this box A A A A A A A A A Statistical MT Training Pipeline 1 Align sentence pairs GIZA ID: 418459

synchronized weakly feature quantitative weakly synchronized quantitative feature parsing alignment results sentences correspondence english amp office local model joint

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Joint Parsing and Alignment with Weakly ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Joint Parsing and Alignment with Weakly Synchronized Grammars

David Burkett, John Blitzer, & Dan Klein

TexPoint

fonts used in EMF.

Read the

TexPoint

manual before you delete this box.:

A

A

A

A

A

A

A

A

ASlide2

Statistical MT Training Pipeline

1) Align sentence pairs (GIZA++)2) Parse English sentences (Berkeley parser)

Parse Foreign sentences

3) Extract rules (Galley et al. 2006)

4) Tune discriminative parameters

在at办公室office里in读了read书bookreadthebookintheoffice

}

Joint model for (1) & (2) Slide3

Data Setting for

Joint Models

(

中文

;

)English WSJ...(EN; )(EN; )(EN; )(中文; )...(中文; )Chinese CTBParallel, Aligned CTB...(EN,中文; )(EN,中文; )(EN,中文; )

Unlabeled parallel text

.

.

.

(

EN

;

中文

)

(

EN

;

中文

)

(

EN

;

中文

)Slide4

Word alignment grids

at

办公室

office

里in读了read书bookreadthebookintheofficeSlide5

Syntactic

Correspondences

EN

中文

Build a modelSlide6

Correspondence via Synchronous GrammarsSlide7

Synchronous derivationSlide8

Synchronous DerivationSlide9

Weakly Synchronized ExampleSlide10

Weakly Synchronized Example

Separate PCFGsSlide11

Weakly Synchronized Example

ITG alignmentSlide12

Weakly Synchronized Example

Points for synchronization, but not requiredSlide13

Correspondence Model & Feature Types

办公室

office

Feature type 1: Word Alignment

EN中文PPPPFeature type 3: CorrespondenceFeature type 2: Monolingual ParserENPPin the officeEN中文EN中文EN中文

EN

中文

EN

中文

[HBDK09]Slide14

Estimating

EN

中文

EN

中文

Set to maximize the log-likelihood of the correct parses & alignmentsENEN中文中文EN中文EN中文 normalizes to sum to 1Slide15

Computing

PP

PP

Correspondence features tie pieces together

EN

中文EN中文Computing exactly is intractableEN中文EN中文Individual , , have polynomial-time dynamic programming algorithmsSlide16

Approximating : Mean Field

Exploit tractability in individual models:

Factored approximation:

EN

中文

PPPPInitialize separatelyIterate:Set to minimize EN中文

EN

中文

AlgorithmSlide17

Large scale inference

We can approximate in polynomial time, but . . .

EN

中文

Sum over possible alignments is an algorithm.

But computers are fast, right?Medium-length sentences are 50 words longSmall translation data sets are 250,000 sentences~4 quadrillion operations (See for speedup details)[BBK10, HBDK09]Slide18

Quantitative Results: ParsingSlide19

Quantitative Results: Parsing

85.7%

83.6%Slide20

Quantitative Results: Parsing

81.2%

84.5%Slide21

Incorrect English PP AttachmentSlide22

Corrected English PP AttachmentSlide23

Quantitative Results: Translation

69.5%

85.0%

BLEU improvement from

29.4

to 30.6 79.5%Slide24

Better Translations with Bilingual Adaptation

Reference

At this point the cause of the plane collision is still unclear. The local

caa

will launch an investigation into this .

Baseline (GIZA++)The cause of planes is still not clear yet, local civil aviation department will investigate this . 目前导致飞机相撞的原因尚不清楚,当地民航部门将对此展开调查Cur-rentlycauseplanecrashDEreasonyetnotclear,localcivilaero-nauticsbureauwill

toward

open

investi-gations

Bilingual Adaptation Model

The cause of plane collision remained unclear, local civil aviation departments will launch an investigation .Slide25

Thanks