Ralph Grishman NYU A Grammar Formalism We have informally described the basic constructs of English grammar Now we want to introduce a formalism for representing these constructs a formalism that we can use as input to a ID: 783496
Download The PPT/PDF document "Context-Free Grammar CSCI-GA.2590 – Le..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Context-Free GrammarCSCI-GA.2590 – Lecture 3
Ralph Grishman
NYU
Slide2A Grammar FormalismWe have informally described the basic constructs of English grammarNow we want to introduce a formalism for representing these constructsa formalism that we can use as input to a parsing procedure1/16/14NYU2
Slide3Context-Free GrammarA context-free grammar consists ofa set of non-terminal symbols A, B, C, … ε Na set of terminal symbols a, b, c, … ε Ta start symbol S ε Na set of productions P of the form N (N U T)*1/16/14NYU3
Slide4A Simple Context-Free GrammarA simple CFG:S NP VPNP catsNP the catsNP the old catsNP miceVP sleepVP chase NP
1/16/14NYU4
Slide5Derivation and LanguageIf A β is a production of the grammar, we can rewrite α A γ α β γ A derivation is a sequence of rewrite operations … …
NP VP cats VP cats chase NPThe language generated by a CFG is the set of strings (sequences of terminals) which can be derived from the start symbol
S … …
T* S NP VP
cats
VP
cats chase NP
cats chase mice
1/16/14
NYU
5
Slide6PreterminalsIt is convenient to include a set of symbols called preterminals (corresponding to the parts of speech) which can be directly rewritten as terminals (words)This allows us to separate the productions into a set which generates sequences of preterminals (the “grammar”) and those which rewrite the preterminals as terminals (the “dictionary”)1/16/14NYU6
Slide7A Grammar with Preterminalsgrammar:S NP VPNP NNP ART NNP ART ADJ NVP VVP V NPdictionary:
N catsN miceADJ oldDET
theV
sleepV
chase
1/16/14
NYU
7
Slide8Grouping AlternatesTo make the grammar more compact, we group productions with the same left-hand side:S NP VPNP N | ART N | ART ADJ NVP V | V NP1/16/14NYU8
Slide9A grammar can be used togeneraterecognizeparseWhy parse?parsing assigns the sentence a structure that may be helpful in determining its meaning1/16/14NYU9
Slide10vs Finite State LanguageCFGs are more powerful than finite-state grammars (regular expressions)FSG cannot generate center embeddingsS ( S ) | xeven if FSG can capture the language, it may be unable to assign the nested structures we want
Slide11A slightly bigger CFGsentence np vpnp ngroup | ngroup ppngroup n | art n | art adj nvp v | v
np | v vp | v np pp (auxilliary)pp p np
(pp = prepositional phrase)
1/16/14
11
NYU
Slide12AmbiguityMost sentences will have more than one parseGenerally different parses will reflect different meanings …“I saw the man with a telescope.”Can attach pp (“with a telescope”) under np or vp1/16/14NYU12
Slide13A CFG with just 2 nonterminalsS NP V | NP V NPNP N | ART NOUN | ART ADJ Nuse this for tracing our parsers1/16/1413NYU
Slide14Top-down parserrepeatexpand leftmost non-terminal using first production (save any alternative productions on backtrack stack)if we have matched entire sentence, quit (success)if we have generated a terminal which doesn't match sentence, pop choice point from stack (if stack is empty, quit (failure))1/16/14NYU14
Slide15Top-down parser1/16/14NYU150: Sthe cat chases mice
Slide16Top-down parser1/16/14NYU160: S1: NPthe cat chases mice2: V
backtrack table
0: S
NP V NP
Slide17Top-down parser1/16/14NYU170: S1: NP3: Nthe cat chases mice
2: V
backtrack table
0: S
NP V NP
1:NP
ART ADJ N
1: NP
ART N
Slide18Top-down parser1/16/14NYU180: S1: NP3: ARTthe cat chases mice
4: N2: V
backtrack table
0: S
NP V NP
1:NP
ART ADJ N
Slide19Top-down parser1/16/14NYU190: S1: NP3: ARTthe cat chases mice
4: ADJ2: V5: N
backtrack table
0: S
NP V NP
Slide20Top-down parser1/16/14NYU200: S1: NPthe cat chases mice2: V
3: NP
backtrack table
Slide21Top-down parser1/16/14NYU210: S1: NP3: Nthe cat chases mice
2: V3: NP
backtrack table
1:NP
ART ADJ N
1: NP
ART N
Slide22Top-down parser1/16/14NYU220: S1: NP4: ARTthe cat chases mice
2: V3: NP5: N
backtrack table
1:NP
ART ADJ N
Slide23Top-down parser1/16/14NYU230: S1: NP4: ARTthe cat chases mice
2: V3: NP5: N
6: N
parse!
backtrack table
1:NP
ART ADJ N
3: NP
ART ADJ N
3: NP
ART N
Slide24Bottom-up parserBuilds a table where each row represents a parse tree node spanning the words from start up to end1/16/14NYU24symbolstart
endconstituentsN01-
Slide25Bottom-up parserWe initialize the table with the parts-of –speech of each word …1/16/14NYU25symbolstartendconstituents
ART01-N12
-
V23
-
N
3
4
-
Slide26Bottom-up parserWe initialize the table with the parts-of –speech of each word …remembering that many English words have several parts of speech1/16/14NYU26symbolstart
endconstituentsART01-N
1
2-V
2
3
-
N
2
3
-
N
3
4
-
Slide27Bottom-up parserThen if these is a production AB C and we have entries for B and C with endB = startC, we add an entry for A with start = startB and end = endC
[see lecture notes for handling general productions]
1/16/14
NYU
27
node #
symbol
start
end
constituents
0
ART
0
1
-
1
N
1
2
-
2
V
2
3
-
3
N
2
3
-
4
N
3
4
-
5
NP
0
2
[0, 1]
Slide28Bottom-up parser1/16/14NYU28node #symbolstart
endconstituents0ART01-
1N
12
-
2
V
2
3
-
3
N
2
3
-
4
N
3
4
-
5
NP
0
2
[0, 1]
6
NP
1
2
[1]
7
NP
2
3
[3]
8
NP
3
4
[4]
9
S
0
4
[5, 2, 8]
10
S
1
4
[6, 2, 8]
several more S’s
parse!