/
CS460/626 : Natural Language CS460/626 : Natural Language

CS460/626 : Natural Language - PowerPoint Presentation

faustina-dinatale
faustina-dinatale . @faustina-dinatale
Follow
406 views
Uploaded On 2015-10-04

CS460/626 : Natural Language - PPT Presentation

ProcessingSpeech NLP and the Web Lecture 29 CYK Inside Probability Parse Tree construction Pushpak Bhattacharyya CSE Dept IIT Bombay 22 nd March 2011 Penn POS Tags JohnNNP ID: 149636

bullets building gunman sprayed building bullets sprayed gunman cyk vbd from12345670dtnp cky parsing cont

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "CS460/626 : Natural Language" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

CS460/626 : Natural Language Processing/Speech, NLP and the Web(Lecture 29– CYK; Inside Probability; Parse Tree construction)

Pushpak Bhattacharyya

CSE Dept.,

IIT

Bombay

22

nd

March, 2011Slide2

Penn POS Tags[John/NNP ]wrote/VBD [ those/DT words/NNS ]in/IN [ the/DT Book/NN ]of/IN [ Proverbs/NNS ]

John wrote those words in the Book of Proverbs.Slide3

Penn Treebank(S (NP-SBJ (NP John)) (VP wrote (NP those words) (PP-LOC in (NP (NP-TTL (NP the Book) (PP of

(NP Proverbs)))

John wrote those words in the Book of Proverbs.Slide4

PSG Parse TreeOfficial trading in the shares will start in Paris on Nov 6.

S

VP

NP

N

AP

official

PP

trading

will

start

on

Nov

6

A

PP

NP

in

P

the

shares

NP

PP

V

Aux

in

ParisSlide5

Penn POS Tags[ Official/JJ trading/NN ]in/IN [ the/DT shares/NNS ]will/MD start/VB in/IN [ Paris/NNP ]on/IN

[ Nov./NNP 6/CD ]

Official trading in the shares will start in Paris on Nov 6.Slide6

Penn POS Tag SsetAdjective: JJAdverb: RBCardinal Number: CDDeterminer: DT

Preposition: IN

Coordinating Conjunction CC

Subordinating Conjunction: IN

Singular Noun: NN

Plural Noun: NNS

Personal Pronoun: PP

Proper Noun: NP

Verb base form: VB

Modal verb: MD

Verb (3sg Pres): VBZ

Wh-determiner: WDT

Wh-pronoun: WPSlide7

CYK Parsing(some slides borrowed from Jimmy Lin’s “Syntactic Parsing with CFGs)Slide8

Shared Sub-ProblemsObservation: ambiguous parses still share sub-treesWe don’t want to redo work that’s already been doneUnfortunately, naïve backtracking leads to duplicate workSlide9

Shared Sub-Problems: ExampleSlide10

Efficient ParsingDynamic programming to the rescue!Intuition: store partial results in tables, thereby:Avoiding repeated work on shared sub-problemsEfficiently storing ambiguous structures with shared sub-partsTwo algorithms:

CKY: roughly, bottom-up

Earley

: roughly, top-downSlide11

CKY Parsing: CNFCKY parsing requires that the grammar consist of ε-free, binary rules = Chomsky Normal FormAll rules of the form:A  BC or Aa

What does the tree look like?

What if my CFG isn’t in CNF?

A → B C

D → wSlide12

CKY Parsing with Arbitrary CFGsProblem: my grammar has rules like VP → NP PP PPCan’t apply CKY!Solution: rewrite grammar into CNFIntroduce new intermediate non-terminals into the grammar

What does this mean?

= weak equivalence

The rewritten grammar accepts (and rejects) the same set of strings as the original grammar…

But the resulting derivations (trees) are different

A

B C D

A

X D

X

B C

(Where X is a symbol that doesn’t occur anywhere else in the grammar)Slide13

CKY Parsing: IntuitionConsider the rule D → wTerminal (word) forms a constituentTrivial to applyConsider the rule A → B C

If there is an A somewhere in the input then there must be a B followed by a C in the input

First, precisely define span [

i

,

j

]

If A spans from

i

to

j

in the input then there must be some

k such that i

<k<jEasy to apply: we just need to try different values for k

i

j

kSlide14

CKY Parsing: TableAny constituent can conceivably span [ i, j ] for all 0≤i<j

N

, where

N

= length of input string

We need an

N

×

N

table to keep track of all spans…

But we only need half of the table

Semantics of table: cell [ i,

j ] contains A iff A spans i to j in the input string

Of course, must be allowed by the grammar!Slide15

CKY Parsing: Table-FillingIn order for A to span [ i, j ]:A

B C is a rule in the grammar, and

There must be a B in [

i

,

k

] and a C in [

k

,

j

] for some

i<k<j

Operationally: To apply rule A  B C, look for a B in [ i,

k ] and a C in [ k, j ]In the table: look left in the row and down in the columnSlide16

CKY AlgorithmSlide17

CKY Parsing: Recognize or ParseIs this really a parser?Recognizer to parser: add backpointers!Slide18

CKY: Algorithmic ComplexityWhat’s the asymptotic complexity of CKY?O(n3)Slide19

CKY: AnalysisSince it’s bottom up, CKY populates the table with a lot of “phantom constituents”Spans that are constituents, but cannot really occur in the context in which they are suggestedConversion of grammar to CNF adds additional non-terminal nodesLeads to weak equivalence

wrt

original grammar

Additional terminal nodes not (linguistically) meaningful: but can be cleaned up with post processing

Is there a parsing algorithm for arbitrary CFGs that combines dynamic programming and top-down control

?

Yes:

Earley

ParsingSlide20

Penn Treebank( (S (NP-SBJ (NP Official trading) (PP in (NP the shares))) (VP will

(VP start

(PP-LOC in

(NP Paris))

(PP-TMP on

(NP (NP Nov 6)

Official trading in the shares will start in Paris on Nov 6.Slide21

Probabilistic Context Free GrammarsS  NP VP 1.0NP  DT NN 0.5NP  NNS 0.3NP  NP PP 0.2

PP  P NP 1.0

VP  VP PP 0.6

VP  VBD NP 0.4

DT

 the 1.0

NN

gunman

0.5

NN  building 0.5

VBD  sprayed 1.0

NNS  bullets 1.0Slide22

Example Parse t1The gunman sprayed the building with bullets.

S

1.0

NP

0.5

VP

0.6

DT

1.0

NN

0.5

VBD

1.0

NP

0.5

PP

1.0

DT

1.0

NN

0.5

P

1.0

NP

0.3

NNS

1.0

bullets

with

building

the

The

gunman

sprayed

P (t

1

) = 1.0 * 0.5 * 1.0 * 0.5 * 0.6 * 0.4 * 1.0 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.00225

VP

0.4Slide23

Another Parse t2S1.0

NP

0.5

VP

0.4

DT

1.0

NN

0.5

VBD

1.0

NP

0.5

PP

1.0

DT

1.0

NN

0.5

P

1.0

NP

0.3

NNS

1.0

bullets

with

building

the

The

gunman

sprayed

NP

0.2

P (t

2

) = 1.0 * 0.5 * 1.0 * 0.5 * 0.4 * 1.0 * 0.2 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.0015

The gunman sprayed the building with bullets.Slide24

Illustrating CYK [Cocke, Younger, Kashmi] AlgoS  NP VP 1.0

NP  DT NN 0.5

NP  NNS 0.3

NP  NP PP 0.2

PP  P NP 1.0

VP  VP PP 0.6

VP  VBD NP 0.4

DT

 the 1.0

NN

gunman

0.5

NN  building 0.5

VBD  sprayed 1.0

NNS  bullets 1.0Slide25

CYK: Start with (0,1)0 The

1

gunman

2

sprayed

3

the

4

building

5

with

6

bullets

7.

To

From1

234

5670

DT

1-------

2----------------

3

----------------

--------

4--------

--------------------------

5

-----------------

--------------------------

6

-------------------------

---------------------------Slide26

CYK: Keep filling diagonals0 The 1

gunman

2

sprayed

3

the

4

building

5

with

6

bullets

7.

To

From

1234

567

0DT

1-------

NN2-------

---------

3

------------------------

4

-----------------

-----------------

5--------

--------------------------

---------

6-----------------

-----------------

------------------Slide27

CYK: Try getting higher level structures0 The

1

gunman

2

sprayed

3

the

4

building

5

with

6 bullets 7.

To

From

123

456

70DT

NP

1-------NN

2----------------

3

----------------

--------

4--------

-----------------

---------

5-----------------

--------------------------

6

-------------------------

---------------------------Slide28

CYK: Diagonal continues0 The 1 gunman

2

sprayed

3

the

4

building

5

with

6

bullets

7.

To

From

1234

567

0DTNP

1

-------NN2

----------------VBD

3

----------------

--------

4--------

-----------------

---------

5-----------------

--------------------------

6

-------------------------

---------------------------Slide29

CYK (cont…)0 The 1 gunman

2

sprayed

3

the

4

building

5

with

6

bullets

7

.

To

From1

2345

670

DTNP--------

1

-------NN--------

2----------------

VBD

3-------

-----------------

4

-----------------

-----------------

5--------

--------------------------

---------

6-----------------

--------------------------

---------Slide30

CYK (cont…)0 The 1 gunman

2

sprayed

3

the

4

building

5

with

6

bullets

7.

To

From

1234

567

0DTNP

--------

1-------NN--------

2-------

---------VBD

3

----------------

--------DT

4

-------------------------

---------

5--------

--------------------------

---------

6-------------------------

---------------------------Slide31

CYK (cont…)0 The 1 gunman

2

sprayed

3

the

4

building

5

with

6

bullets

7.

To

From

1234

567

0DTNP

-----------------

1-------NN--------

---------

2-------

---------VBD

---------

3----------------

--------DT

4

-----------------

-----------------NN

5

-----------------

--------------------------

6--------

-----------------------------------

---------Slide32

CYK: starts filling the 5th column0 The

1

gunman

2

sprayed

3

the

4

building

5 with 6 bullets 7.

To

From

1

2345

670

DTNP--------

---------1-------

NN--------

---------

2-------

---------VBD

---------

3-------

-----------------

DT

NP

4-----------------

-----------------

NN5

-------------------------

------------------

6-----------------

-----------------------------------Slide33

CYK (cont…)0 The 1 gunman

2

sprayed

3

the

4

building

5 with 6 bullets 7.

To

From

1

2345

670

DTNP--------

---------1-------

NN--------

---------

2-------

---------VBD

---------VP

3

------------------------

DT

NP

4--------

--------------------------

NN

5-----------------

--------------------------

6-----------------

--------------------------

---------Slide34

CYK (cont…)0 The 1

gunman

2

sprayed

3

the

4

building 5 with 6 bullets

7.

To

From

1234

567

0DTNP

-----------------1

-------NN--------

---------

---------

2----------------

VBD---------

VP

3-------

-----------------

DT

NP

4-----------------

-----------------NN

5

----------------------------------

---------6

-------------------------

---------------------------Slide35

CYK: S found, but NO termination!0 The

1

gunman

2

sprayed

3

the

4 building 5 with

6 bullets 7.

To

From

123

4567

0DT

NP-----------------S

1-------

NN--------

------------------

2

----------------VBD

---------

VP

3----------------

--------DT

NP

4

----------------------------------

NN

5-----------------

--------------------------

6-----------------

--------------------------

---------Slide36

CYK (cont…)0 The 1 gunman

2

sprayed

3

the

4

building

5

with

6

bullets

7.

To

From

1234

567

0DTNP

-----------------

S1-------NN--------

---------

---------

2----------------

VBD---------

VP

3-------

-----------------DT

NP

4

-----------------

-----------------NN

5--------

-----------------

------------------P

6-----------------

--------------------------

---------Slide37

CYK (cont…)0 The 1 gunman

2

sprayed

3

the

4

building

5

with

6

bullets

7

.

To

From1

2345

670

DTNP--------

---------S

---------1-------NN--------

---------

------------------

2-------

---------VBD

---------VP

---------

3----------------

--------DT

NP

---------

4-----------------

-----------------NN

---------

5-----------------

--------------------------

P6--------

--------------------------

------------------Slide38

CYK: Control moves to last column0 The 1 gunman

2

sprayed

3

the

4

building

5

with

6

bullets

7.

To

From

123

4567

0DTNP

-----------------

S---------1-------NN

-----------------

---------

---------2

----------------

VBD---------

VP---------

3

------------------------

DT

NP---------

4

----------------------------------

NN---------

5--------

--------------------------

---------P6

-------------------------

------------------

---------NPNNSSlide39

CYK (cont…)0 The 1 gunman

2

sprayed

3

the

4

building

5

with

6

bullets 7.

To

From

12

3456

70DT

NP--------

---------S---------1-------

NN--------

------------------

---------

2-------

---------VBD

---------VP

---------

3----------------

--------DT

NP

---------

4-------------------------

---------NN

---------5

-------------------------

------------------P

PP6--------

-----------------

---------------------------

NPNNSSlide40

CYK (cont…)0 The 1 gunman

2

sprayed

3

the

4

building

5

with 6 bullets

7.

To

From

1234

567

0DTNP

-----------------S---------

1-------

NN--------

------------------

---------

2----------------

VBD---------

VP

---------3

----------------

--------DT

NP

---------NP

4-----------------

-----------------NN

------------------

5-------------------------

------------------P

PP

6-----------------

--------------------------

---------NPNNSSlide41

CYK (cont…)0 The 1 gunman

2

sprayed

3

the

4

building

5 with 6

bullets 7.

To

From

12

3456

70DT

NP-----------------S

---------1

-------NN

-----------------

------------------

2

----------------VBD

---------

VP---------

VP3

----------------

--------DT

NP---------

NP

4-------------------------

---------NN

------------------5--------

-----------------

------------------P

PP

6-------------------------

---------------------------NPNNSSlide42

CYK: filling the last column0 The 1 gunman

2

sprayed

3

the

4

building

5 with 6

bullets 7.

To

From

12

3456

70DT

NP-----------------S

---------1

-------NN

-----------------

------------------

---------

2----------------

VBD---------

VP

---------VP

3----------------

--------DT

NP

---------NP

4-----------------

-----------------NN

------------------5

-----------------

--------------------------

PPP

6-----------------

-----------------------------------NP

NNSSlide43

CYK: terminates with S in (0,7)0 The

1

gunman

2

sprayed

3

the

4 building 5

with 6

bullets 7.

To

From

123

4567

0DTNP-----------------

S

---------S

1-------

NN--------

------------------

------------------

2-------

---------VBD

---------VP

---------

VP3

------------------------DT

NP

---------NP

4-----------------

-----------------NN

------------------

5--------

--------------------------

---------P

PP6-------------------------

---------------------------

NPNNSSlide44

CYK: Extracting the Parse TreeThe parse tree is obtained by keeping back pointers.S (0-7)

NP (0-2)

V

P (2-7)

VBD (2-3)

NP (3-7)

DT (0-1)

NN (1-2)

T

he

gunman

sprayed

NP (3-5)

PP (5-7)

DT (3-4)

NN (4-5)

P (5-6)

NP (6-7)

NNS (6-7)

the

building

with

bullets