/
CS460/626 : Natural Language CS460/626 : Natural Language

CS460/626 : Natural Language - PowerPoint Presentation

ellena-manuel
ellena-manuel . @ellena-manuel
Follow
396 views
Uploaded On 2016-05-01

CS460/626 : Natural Language - PPT Presentation

ProcessingSpeech NLP and the Web Lecture 29 CYK Inside Probability Parse Tree construction Pushpak Bhattacharyya CSE Dept IIT Bombay 24 th March 2011 CYK Parsing Shared SubProblems Example ID: 301130

gunman building bullets sprayed building gunman sprayed bullets cyk vbd probabilities from12345670dtnp parse probability rule cont

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "CS460/626 : Natural Language" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

CS460/626 : Natural Language Processing/Speech, NLP and the Web(Lecture 29– CYK; Inside Probability; Parse Tree construction)

Pushpak Bhattacharyya

CSE Dept.,

IIT

Bombay

24

th

March, 2011Slide2

CYK ParsingSlide3

Shared Sub-Problems: ExampleSlide4

CKY Parsing: CNFCKY parsing requires that the grammar consist of ε-free, binary rules = Chomsky Normal FormAll rules of the form:A  BC or Aa

What does the tree look like?

What if my CFG isn’t in CNF?

A → B C

D → wSlide5

CKY AlgorithmSlide6

Illustrating CYK [Cocke, Younger, Kashmi] AlgoS  NP VP 1.0

NP  DT NN 0.5

NP  NNS 0.3

NP  NP PP 0.2

PP  P NP 1.0

VP  VP PP 0.6

VP  VBD NP 0.4

DT

 the 1.0

NN

gunman

0.5

NN  building 0.5

VBD  sprayed 1.0

NNS  bullets 1.0Slide7

CYK: Start with (0,1)0 The

1

gunman

2

sprayed

3

the

4

building

5

with

6

bullets

7

.

To

From12345670DT1-------2----------------3------------------------4----------------------------------5-------------------------------------------6----------------------------------------------------Slide8

CYK: Keep filling diagonals0 The 1

gunman

2

sprayed

3

the

4

building

5

with

6

bullets

7

.

To

From12345670DT1-------NN2----------------3------------------------4----------------------------------5-------------------------------------------6-------------------------------------------

---------Slide9

CYK: Try getting higher level structures0 The

1

gunman

2

sprayed

3

the

4

building

5

with

6

bullets

7

.To From12345670DTNP1-------NN2----------------3------------------------4----------------------------------5-------------------------------------------6----------------------------------

------------------Slide10

CYK: Diagonal continues0 The 1 gunman

2

sprayed

3

the

4

building

5

with

6

bullets

7

.

To

From12345670DTNP1-------NN2----------------VBD3------------------------4----------------------------------5-------------------------------------------6----------------------------------

------------------Slide11

CYK (cont…)0 The 1 gunman

2

sprayed

3

the

4

building

5

with

6

bullets

7

.

To

From

12345670DTNP--------1-------NN--------2----------------VBD3------------------------4----------------------------------5-------------------------------------------6-------------------------

---------------------------Slide12

CYK (cont…)0 The 1 gunman

2

sprayed

3

the

4

building

5

with

6

bullets

7

.

To

From12345670DTNP--------1-------NN--------2----------------VBD3------------------------DT4----------------------------------5-------------------------------------------6--------

--------------------------

---------

---------Slide13

CYK (cont…)0 The 1 gunman

2

sprayed

3

the

4

building

5

with

6

bullets

7

.

To

From12345670DTNP-----------------1-------NN-----------------2----------------VBD---------3------------------------DT4----------------------------------NN5----------------------------------

---------6

--------

---------

--------

---------

---------

---------Slide14

CYK: starts filling the 5th column0 The

1

gunman

2

sprayed

3

the

4

building

5

with

6

bullets 7.To From12345670DTNP-----------------1-------NN-----------------2----------------VBD---------3------------------------DTNP4----------------------------------NN

5--------

---------

--------

---------

---------

6

--------

---------

--------

---------

---------

---------Slide15

CYK (cont…)0 The 1 gunman

2

sprayed

3

the

4

building

5

with

6

bullets 7.To From12345670DTNP-----------------1-------NN-----------------2----------------VBD---------VP3------------------------DTNP4----------------------------------NN

5--------

---------

--------

---------

---------

6

--------

---------

--------

---------

---------

---------Slide16

CYK (cont…)0 The 1

gunman

2

sprayed

3

the

4

building

5

with 6 bullets 7.To From12345670DTNP-----------------1-------NN--------------------------2----------------VBD---------VP3------------------------DTNP4-------------------------

---------NN

5

--------

---------

--------

---------

---------

6

--------

---------

--------

---------

---------

---------Slide17

CYK: S found, but NO termination!0 The

1

gunman

2

sprayed

3

the

4

building

5 with 6 bullets 7.To From12345670DTNP-----------------S1-------NN--------------------------2----------------VBD---------VP3------------------------DTNP4-----------------

-----------------NN

5

--------

---------

--------

---------

---------

6

--------

---------

--------

---------

---------

---------Slide18

CYK (cont…)0 The 1 gunman

2

sprayed

3

the

4

building

5

with

6

bullets

7

.

To

From12345670DTNP-----------------S1-------NN--------------------------2----------------VBD---------VP3------------------------DTNP4----------------------------------NN

5-----------------

--------

---------

---------

P

6

--------

---------

--------

---------

---------

---------Slide19

CYK (cont…)0 The 1 gunman

2

sprayed

3

the

4

building

5

with

6

bullets

7

.

To

From

12345670DTNP-----------------S---------1-------NN-----------------------------------2----------------VBD---------VP---------3------------------------DTNP---------4-------------------------

---------NN

---------

5

--------

---------

--------

---------

---------

P

6

--------

---------

--------

---------

---------

---------Slide20

CYK: Control moves to last column0 The 1 gunman

2

sprayed

3

the

4

building

5

with

6

bullets

7

.

To

From12345670DTNP-----------------S---------1-------NN-----------------------------------2----------------VBD---------VP---------3------------------------DTNP---------4-----------------

-----------------NN

---------

5

--------

---------

--------

---------

---------

P

6

--------

---------

--------

---------

---------

---------

NP

NNSSlide21

CYK (cont…)0 The 1 gunman

2

sprayed

3

the

4

building

5

with

6

bullets

7

.To From12345670DTNP-----------------S---------1-------NN-----------------------------------2----------------VBD---------VP---------3------------------------DTNP---------4--------

--------------------------

NN

---------

5

--------

---------

--------

---------

---------

P

PP

6

--------

---------

--------

---------

---------

---------

NP

NNSSlide22

CYK (cont…)0 The 1 gunman

2

sprayed

3

the

4

building

5

with

6

bullets 7.To From12345670DTNP-----------------S---------1-------NN-----------------------------------2----------------VBD---------VP---------3------------------------DTNP---------

NP4--------

---------

--------

---------

NN

---------

---------

5

--------

---------

--------

---------

---------

P

PP

6

--------

---------

--------

---------

---------

---------

NP

NNSSlide23

CYK (cont…)0 The 1 gunman

2

sprayed

3

the

4

building

5

with

6 bullets 7.To From12345670DTNP-----------------S---------1-------NN-----------------------------------2----------------VBD---------VP---------VP3------------------------DT

NP---------

NP

4

--------

---------

--------

---------

NN

---------

---------

5

--------

---------

--------

---------

---------

P

PP

6

--------

---------

--------

---------

---------

---------

NP

NNSSlide24

CYK: filling the last column0 The 1 gunman

2

sprayed

3

the

4

building

5

with

6 bullets 7.To From12345670DTNP-----------------S---------1-------NN--------------------------------------------2----------------VBD---------VP---------VP3------------------------DT

NP---------

NP

4

--------

---------

--------

---------

NN

---------

---------

5

--------

---------

--------

---------

---------

P

PP

6

--------

---------

--------

---------

---------

---------

NP

NNSSlide25

CYK: terminates with S in (0,7)0 The

1

gunman

2

sprayed

3

the

4

building 5 with 6 bullets 7.To From12345670DTNP-----------------S---------S1-------NN--------------------------------------------2----------------VBD---------VP---------VP3----------------

--------DT

NP

---------

NP

4

--------

---------

--------

---------

NN

---------

---------

5

--------

---------

--------

---------

---------

P

PP

6

--------

---------

--------

---------

---------

---------

NP

NNSSlide26

CYK: Extracting the Parse TreeThe parse tree is obtained by keeping back pointers.S (0-7)

NP (0-2)

V

P (2-7)

VBD (2-3)

NP (3-7)

DT (0-1)

NN (1-2)

T

he

gunman

sprayed

NP (3-5)

PP (5-7)

DT (3-4)

NN (4-5)

P (5-6)

NP (6-7)

NNS (6-7)

the

building

with

bulletsSlide27

Probabilistic parse tree constructionSlide28

Interesting ProbabilitiesThe gunman sprayed the building with bullets

1 2 3 4 5 6 7

N

1

NP

What is the probability of having a NP at this position such that it will derive “the building” ? -

What is the probability of starting from N

1

and deriving “The gunman sprayed”, a NP and “with bullets” ? -

Inside Probabilities

Outside ProbabilitiesSlide29

Interesting ProbabilitiesRandom variables to be consideredThe non-terminal being expanded. E.g., NP

The word-span covered by the non-terminal.

E.g., (

4,5) refers to words “the building”

While calculating probabilities, consider:

The rule to be used for expansion :

E.g.,

NP

DT NN

The probabilities associated with the RHS non-terminals :

E.g.,

DT

subtree’s

inside/outside probabilities & NN

subtree’s inside/outside probabilities Slide30

Outside Probability j(p,q

) :

The probability of beginning with

N

1

& generating the non-terminal

N

j

pq

and all words outside

w

p

..

w

qw1 ………wp-1 wp…wqwq+1 ……… wmN1NjSlide31

Inside Probabilitiesj(p,q) :

The probability of generating the words

w

p

..

w

q

starting with the non-terminal

N

j

pq

.

w

1

………

wp-1 wp…wqwq+1 ……… wmN1NjSlide32

Outside & Inside Probabilities: exampleThe gunman sprayed the

building with

bullets

1 2 3 4 5 6 7

N

1

NPSlide33

Calculating Inside probabilities j(p,q)

Base case:

Base case is used for rules which derive the words or terminals directly

E.g.,

Suppose

N

j

= NN

is being considered &

NN

building is one of the rules with probability 0.5Slide34

Induction Step: Assuming Grammar in Chomsky Normal FormInduction step :

w

p

N

j

N

r

N

s

w

d

w

d+1

w

q

Consider different splits of the words - indicated by

d

E.g.,

the huge building

Consider different non-terminals to be used in the rule: NP

DT NN, NP

DT NNS are available options Consider summation over all these.

Split here for d=2 d=3Slide35

The Bottom-Up ApproachThe idea of inductionConsider “the gunman”Base cases : Apply unary rulesDT

the

Prob

= 1.0

NN

 gunman

Prob

= 0.5

Induction :

Prob

that a NP covers these 2 words

= P (NP

 DT NN) * P (DT deriving the word “the”) * P (NN deriving the word “gunman”)

= 0.5 * 1.0 * 0.5 = 0.25

The gunman

NP0.5DT1.0NN0.5Slide36

Parse Triangle

A parse triangle is constructed for calculating

j

(p,q)

Probability of a sentence using 

j

(p,q):Slide37

Parse Triangle

The (1)

gunman (2)

sprayed (3)

the (4)

building (5)

with (6)

bullets (7)

0

1

2

3

4

5

6

Fill diagonals with

from

toSlide38

Parse Triangle

The (1)

gunman (2)

sprayed (3)

the (4)

building (5)

with (6)

bullets (7)

1

2

3

4

5

6

7

Calculate using induction formulaSlide39

Example Parse t1S1.0

NP

0.5

VP

0.6

DT

1.0

NN

0.5

VBD

1.0

NP

0.5

PP

1.0

DT

1.0NN0.5P1.0NP0.3NNS1.0

bullets

with

building

the

The

gunman

sprayed

VP

0.4

Rule used here is

VP  VP PP

The gunman sprayed the building with bullets.Slide40

Another Parse t2S1.0

NP

0.5

VP

0.4

DT

1.0

NN

0.5

VBD

1.0

NP

0.5

PP

1.0

DT

1.0NN0.5P1.0NP0.3NNS1.0bulletswith

building

the

The

gunman

sprayed

NP

0.2

Rule used here is

VP  VBD NP

The gunman sprayed the building with bullets.Slide41

Parse Triangle

The (1)

gunman (2)

sprayed (3)

the (4)

building (5)

with (6)

bullets (7)

1

2

3

4

5

6

7Slide42

Different ParsesConsiderDifferent splitting points : E.g., 5th and 3rd position Using different rules for VP expansion : E.g.,

VP  VP PP, VP  VBD NP

Different parses for the VP “sprayed the building with bullets” can be constructed this way.Slide43

The Viterbi-like Algorithm for PCFGs

Very similar to calculation of inside probabilities

i

(p,q)

Instead of summing over all ways of constructing the parse for w

pq

Choose only the best way (the maximum probability one!)Slide44

Calculation of i(p,q)

This rule is chosen

VP

0.4

VP

0.4

PP

1.0

VP

0.4

VBD

1.0

NP

0.2

0.6 * 1.0 * 0.3 = 0.18

0.4 * 1.0 * 0.015 = 0.06 Slide45

Viterbi-like AlgorithmBase case:Induction : 

i

(

p,q

) stores

RHS of the rule selected

Position of splitting

Example :

VP

(3,7) stores VP, PP and split position = 5

because

VP  VP PP is the rule used.

Backtracing

: Start from

1(1,7) and 1(1,7) and backtrace.Slide46

Example 1(1,7) records S  NP VP & split position as 2

NP

(1,2) records NP  DT NN & split position as 1

VP

(3,7) records VP  VP PP & split position as 5

S

NP

VP

The gunman sprayed the building with bullets

1 2

3

4

5 6 7 Slide47

Example

S

NP

VP

The gunman sprayed the building with bullets

1 2 3 4 5 6 7

DT

NN

VP

PPSlide48

Grammar InductionAnnotated corpora like Penn TreebankCounts used as follows:Sample training data:

NP

The

boy

DT

NN

NP

Those cars

DT

NNS

NP

Bears

NNS

NP

That book

DT

NN

NP

She

PRPSlide49

Grammar Induction for Unannotated Corpora: EM algorithm

Start

with initial estimates for rule probabilities

Compute probability of each parse of a sentence according to current estimates of rule probabilities

Compute expectation of how often a rule is used (summing probabilities of rules used in previous step)

Refine rule probabilities so that training corpus likelihood increases

EXPECTATION PHASE

MAXIMIZATION PHASESlide50

Outside Probabilities j(p,q)

Base case:

Inductive step for calculating :

w

p

N

f

pe

N

j

pq

N

g

(q+1)e

w

q

wq+1we

w

p-1

w

1

w

m

w

e+1

N

1

Summation over f, g & eSlide51

Probability of a Sentence

Joint probability of a sentence w

1m

and that there is a constituent spanning words

w

p

to

w

q

is given as:

The

gunman sprayed

the

building with

bullets

1 2 3 4 5 6 7N1NP