ProcessingSpeech NLP and the Web Lecture 29 CYK Inside Probability Parse Tree construction Pushpak Bhattacharyya CSE Dept IIT Bombay 22 nd March 2011 Penn POS Tags JohnNNP ID: 149636
Download Presentation The PPT/PDF document "CS460/626 : Natural Language" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
CS460/626 : Natural Language Processing/Speech, NLP and the Web(Lecture 29– CYK; Inside Probability; Parse Tree construction)
Pushpak Bhattacharyya
CSE Dept.,
IIT
Bombay
22
nd
March, 2011Slide2
Penn POS Tags[John/NNP ]wrote/VBD [ those/DT words/NNS ]in/IN [ the/DT Book/NN ]of/IN [ Proverbs/NNS ]
John wrote those words in the Book of Proverbs.Slide3
Penn Treebank(S (NP-SBJ (NP John)) (VP wrote (NP those words) (PP-LOC in (NP (NP-TTL (NP the Book) (PP of
(NP Proverbs)))
John wrote those words in the Book of Proverbs.Slide4
PSG Parse TreeOfficial trading in the shares will start in Paris on Nov 6.
S
VP
NP
N
AP
official
PP
trading
will
start
on
Nov
6
A
PP
NP
in
P
the
shares
NP
PP
V
Aux
in
ParisSlide5
Penn POS Tags[ Official/JJ trading/NN ]in/IN [ the/DT shares/NNS ]will/MD start/VB in/IN [ Paris/NNP ]on/IN
[ Nov./NNP 6/CD ]
Official trading in the shares will start in Paris on Nov 6.Slide6
Penn POS Tag SsetAdjective: JJAdverb: RBCardinal Number: CDDeterminer: DT
Preposition: IN
Coordinating Conjunction CC
Subordinating Conjunction: IN
Singular Noun: NN
Plural Noun: NNS
Personal Pronoun: PP
Proper Noun: NP
Verb base form: VB
Modal verb: MD
Verb (3sg Pres): VBZ
Wh-determiner: WDT
Wh-pronoun: WPSlide7
CYK Parsing(some slides borrowed from Jimmy Lin’s “Syntactic Parsing with CFGs)Slide8
Shared Sub-ProblemsObservation: ambiguous parses still share sub-treesWe don’t want to redo work that’s already been doneUnfortunately, naïve backtracking leads to duplicate workSlide9
Shared Sub-Problems: ExampleSlide10
Efficient ParsingDynamic programming to the rescue!Intuition: store partial results in tables, thereby:Avoiding repeated work on shared sub-problemsEfficiently storing ambiguous structures with shared sub-partsTwo algorithms:
CKY: roughly, bottom-up
Earley
: roughly, top-downSlide11
CKY Parsing: CNFCKY parsing requires that the grammar consist of ε-free, binary rules = Chomsky Normal FormAll rules of the form:A BC or Aa
What does the tree look like?
What if my CFG isn’t in CNF?
A → B C
D → wSlide12
CKY Parsing with Arbitrary CFGsProblem: my grammar has rules like VP → NP PP PPCan’t apply CKY!Solution: rewrite grammar into CNFIntroduce new intermediate non-terminals into the grammar
What does this mean?
= weak equivalence
The rewritten grammar accepts (and rejects) the same set of strings as the original grammar…
But the resulting derivations (trees) are different
A
B C D
A
X D
X
B C
(Where X is a symbol that doesn’t occur anywhere else in the grammar)Slide13
CKY Parsing: IntuitionConsider the rule D → wTerminal (word) forms a constituentTrivial to applyConsider the rule A → B C
If there is an A somewhere in the input then there must be a B followed by a C in the input
First, precisely define span [
i
,
j
]
If A spans from
i
to
j
in the input then there must be some
k such that i
<k<jEasy to apply: we just need to try different values for k
i
j
kSlide14
CKY Parsing: TableAny constituent can conceivably span [ i, j ] for all 0≤i<j
≤
N
, where
N
= length of input string
We need an
N
×
N
table to keep track of all spans…
But we only need half of the table
Semantics of table: cell [ i,
j ] contains A iff A spans i to j in the input string
Of course, must be allowed by the grammar!Slide15
CKY Parsing: Table-FillingIn order for A to span [ i, j ]:A
B C is a rule in the grammar, and
There must be a B in [
i
,
k
] and a C in [
k
,
j
] for some
i<k<j
Operationally: To apply rule A B C, look for a B in [ i,
k ] and a C in [ k, j ]In the table: look left in the row and down in the columnSlide16
CKY AlgorithmSlide17
CKY Parsing: Recognize or ParseIs this really a parser?Recognizer to parser: add backpointers!Slide18
CKY: Algorithmic ComplexityWhat’s the asymptotic complexity of CKY?O(n3)Slide19
CKY: AnalysisSince it’s bottom up, CKY populates the table with a lot of “phantom constituents”Spans that are constituents, but cannot really occur in the context in which they are suggestedConversion of grammar to CNF adds additional non-terminal nodesLeads to weak equivalence
wrt
original grammar
Additional terminal nodes not (linguistically) meaningful: but can be cleaned up with post processing
Is there a parsing algorithm for arbitrary CFGs that combines dynamic programming and top-down control
?
Yes:
Earley
ParsingSlide20
Penn Treebank( (S (NP-SBJ (NP Official trading) (PP in (NP the shares))) (VP will
(VP start
(PP-LOC in
(NP Paris))
(PP-TMP on
(NP (NP Nov 6)
Official trading in the shares will start in Paris on Nov 6.Slide21
Probabilistic Context Free GrammarsS NP VP 1.0NP DT NN 0.5NP NNS 0.3NP NP PP 0.2
PP P NP 1.0
VP VP PP 0.6
VP VBD NP 0.4
DT
the 1.0
NN
gunman
0.5
NN building 0.5
VBD sprayed 1.0
NNS bullets 1.0Slide22
Example Parse t1The gunman sprayed the building with bullets.
S
1.0
NP
0.5
VP
0.6
DT
1.0
NN
0.5
VBD
1.0
NP
0.5
PP
1.0
DT
1.0
NN
0.5
P
1.0
NP
0.3
NNS
1.0
bullets
with
building
the
The
gunman
sprayed
P (t
1
) = 1.0 * 0.5 * 1.0 * 0.5 * 0.6 * 0.4 * 1.0 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.00225
VP
0.4Slide23
Another Parse t2S1.0
NP
0.5
VP
0.4
DT
1.0
NN
0.5
VBD
1.0
NP
0.5
PP
1.0
DT
1.0
NN
0.5
P
1.0
NP
0.3
NNS
1.0
bullets
with
building
the
The
gunman
sprayed
NP
0.2
P (t
2
) = 1.0 * 0.5 * 1.0 * 0.5 * 0.4 * 1.0 * 0.2 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.0015
The gunman sprayed the building with bullets.Slide24
Illustrating CYK [Cocke, Younger, Kashmi] AlgoS NP VP 1.0
NP DT NN 0.5
NP NNS 0.3
NP NP PP 0.2
PP P NP 1.0
VP VP PP 0.6
VP VBD NP 0.4
DT
the 1.0
NN
gunman
0.5
NN building 0.5
VBD sprayed 1.0
NNS bullets 1.0Slide25
CYK: Start with (0,1)0 The
1
gunman
2
sprayed
3
the
4
building
5
with
6
bullets
7.
To
From1
234
5670
DT
1-------
2----------------
3
----------------
--------
4--------
--------------------------
5
-----------------
--------------------------
6
-------------------------
---------------------------Slide26
CYK: Keep filling diagonals0 The 1
gunman
2
sprayed
3
the
4
building
5
with
6
bullets
7.
To
From
1234
567
0DT
1-------
NN2-------
---------
3
------------------------
4
-----------------
-----------------
5--------
--------------------------
---------
6-----------------
-----------------
------------------Slide27
CYK: Try getting higher level structures0 The
1
gunman
2
sprayed
3
the
4
building
5
with
6 bullets 7.
To
From
123
456
70DT
NP
1-------NN
2----------------
3
----------------
--------
4--------
-----------------
---------
5-----------------
--------------------------
6
-------------------------
---------------------------Slide28
CYK: Diagonal continues0 The 1 gunman
2
sprayed
3
the
4
building
5
with
6
bullets
7.
To
From
1234
567
0DTNP
1
-------NN2
----------------VBD
3
----------------
--------
4--------
-----------------
---------
5-----------------
--------------------------
6
-------------------------
---------------------------Slide29
CYK (cont…)0 The 1 gunman
2
sprayed
3
the
4
building
5
with
6
bullets
7
.
To
From1
2345
670
DTNP--------
1
-------NN--------
2----------------
VBD
3-------
-----------------
4
-----------------
-----------------
5--------
--------------------------
---------
6-----------------
--------------------------
---------Slide30
CYK (cont…)0 The 1 gunman
2
sprayed
3
the
4
building
5
with
6
bullets
7.
To
From
1234
567
0DTNP
--------
1-------NN--------
2-------
---------VBD
3
----------------
--------DT
4
-------------------------
---------
5--------
--------------------------
---------
6-------------------------
---------------------------Slide31
CYK (cont…)0 The 1 gunman
2
sprayed
3
the
4
building
5
with
6
bullets
7.
To
From
1234
567
0DTNP
-----------------
1-------NN--------
---------
2-------
---------VBD
---------
3----------------
--------DT
4
-----------------
-----------------NN
5
-----------------
--------------------------
6--------
-----------------------------------
---------Slide32
CYK: starts filling the 5th column0 The
1
gunman
2
sprayed
3
the
4
building
5 with 6 bullets 7.
To
From
1
2345
670
DTNP--------
---------1-------
NN--------
---------
2-------
---------VBD
---------
3-------
-----------------
DT
NP
4-----------------
-----------------
NN5
-------------------------
------------------
6-----------------
-----------------------------------Slide33
CYK (cont…)0 The 1 gunman
2
sprayed
3
the
4
building
5 with 6 bullets 7.
To
From
1
2345
670
DTNP--------
---------1-------
NN--------
---------
2-------
---------VBD
---------VP
3
------------------------
DT
NP
4--------
--------------------------
NN
5-----------------
--------------------------
6-----------------
--------------------------
---------Slide34
CYK (cont…)0 The 1
gunman
2
sprayed
3
the
4
building 5 with 6 bullets
7.
To
From
1234
567
0DTNP
-----------------1
-------NN--------
---------
---------
2----------------
VBD---------
VP
3-------
-----------------
DT
NP
4-----------------
-----------------NN
5
----------------------------------
---------6
-------------------------
---------------------------Slide35
CYK: S found, but NO termination!0 The
1
gunman
2
sprayed
3
the
4 building 5 with
6 bullets 7.
To
From
123
4567
0DT
NP-----------------S
1-------
NN--------
------------------
2
----------------VBD
---------
VP
3----------------
--------DT
NP
4
----------------------------------
NN
5-----------------
--------------------------
6-----------------
--------------------------
---------Slide36
CYK (cont…)0 The 1 gunman
2
sprayed
3
the
4
building
5
with
6
bullets
7.
To
From
1234
567
0DTNP
-----------------
S1-------NN--------
---------
---------
2----------------
VBD---------
VP
3-------
-----------------DT
NP
4
-----------------
-----------------NN
5--------
-----------------
------------------P
6-----------------
--------------------------
---------Slide37
CYK (cont…)0 The 1 gunman
2
sprayed
3
the
4
building
5
with
6
bullets
7
.
To
From1
2345
670
DTNP--------
---------S
---------1-------NN--------
---------
------------------
2-------
---------VBD
---------VP
---------
3----------------
--------DT
NP
---------
4-----------------
-----------------NN
---------
5-----------------
--------------------------
P6--------
--------------------------
------------------Slide38
CYK: Control moves to last column0 The 1 gunman
2
sprayed
3
the
4
building
5
with
6
bullets
7.
To
From
123
4567
0DTNP
-----------------
S---------1-------NN
-----------------
---------
---------2
----------------
VBD---------
VP---------
3
------------------------
DT
NP---------
4
----------------------------------
NN---------
5--------
--------------------------
---------P6
-------------------------
------------------
---------NPNNSSlide39
CYK (cont…)0 The 1 gunman
2
sprayed
3
the
4
building
5
with
6
bullets 7.
To
From
12
3456
70DT
NP--------
---------S---------1-------
NN--------
------------------
---------
2-------
---------VBD
---------VP
---------
3----------------
--------DT
NP
---------
4-------------------------
---------NN
---------5
-------------------------
------------------P
PP6--------
-----------------
---------------------------
NPNNSSlide40
CYK (cont…)0 The 1 gunman
2
sprayed
3
the
4
building
5
with 6 bullets
7.
To
From
1234
567
0DTNP
-----------------S---------
1-------
NN--------
------------------
---------
2----------------
VBD---------
VP
---------3
----------------
--------DT
NP
---------NP
4-----------------
-----------------NN
------------------
5-------------------------
------------------P
PP
6-----------------
--------------------------
---------NPNNSSlide41
CYK (cont…)0 The 1 gunman
2
sprayed
3
the
4
building
5 with 6
bullets 7.
To
From
12
3456
70DT
NP-----------------S
---------1
-------NN
-----------------
------------------
2
----------------VBD
---------
VP---------
VP3
----------------
--------DT
NP---------
NP
4-------------------------
---------NN
------------------5--------
-----------------
------------------P
PP
6-------------------------
---------------------------NPNNSSlide42
CYK: filling the last column0 The 1 gunman
2
sprayed
3
the
4
building
5 with 6
bullets 7.
To
From
12
3456
70DT
NP-----------------S
---------1
-------NN
-----------------
------------------
---------
2----------------
VBD---------
VP
---------VP
3----------------
--------DT
NP
---------NP
4-----------------
-----------------NN
------------------5
-----------------
--------------------------
PPP
6-----------------
-----------------------------------NP
NNSSlide43
CYK: terminates with S in (0,7)0 The
1
gunman
2
sprayed
3
the
4 building 5
with 6
bullets 7.
To
From
123
4567
0DTNP-----------------
S
---------S
1-------
NN--------
------------------
------------------
2-------
---------VBD
---------VP
---------
VP3
------------------------DT
NP
---------NP
4-----------------
-----------------NN
------------------
5--------
--------------------------
---------P
PP6-------------------------
---------------------------
NPNNSSlide44
CYK: Extracting the Parse TreeThe parse tree is obtained by keeping back pointers.S (0-7)
NP (0-2)
V
P (2-7)
VBD (2-3)
NP (3-7)
DT (0-1)
NN (1-2)
T
he
gunman
sprayed
NP (3-5)
PP (5-7)
DT (3-4)
NN (4-5)
P (5-6)
NP (6-7)
NNS (6-7)
the
building
with
bullets