ContextFree Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow Kathy McKeown Dan Jurafsky and James Martin What is Syntax Structure of language ID: 565144
Download Presentation The PPT/PDF document "Parsing with" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Parsing withContext-Free Grammars for ASR
Julia Hirschberg
CS
4706
Slides with contributions from Owen
Rambow
, Kathy
McKeown
, Dan
Jurafsky
and James MartinSlide2
What is Syntax?Structure of languageHow words are arranged together and related to one another
Goal of syntactic analysis: relate surface form (what someone says or writes) to underlying structure, to support semantic analysis (what the utterance or text means)
Syntactic representation: typically a
tree structureSlide3
Structure in StringsA set of words, or, a
lexicon
:
the a small nice big very boy girl sees likes
Some `good’ (
grammatical
) sentences:
the boy likes a girl
the small girl likes the big girl
a very small nice boy sees a very nice boy
Some bad (
ungrammatical
) sentences:
*
the boy the girl
*
small boy likes nice girl
Can we find a way of distinguishing between the two kinds of sequences?
Can we identify similarities among grammatical subsequences?Slide4
One Version of Constituent StructureLexicon:
the a small nice big very boy girl sees likes
Grammatical sentences:
(the)
boy
(likes a girl)
(the small)
girl
(likes the big girl)
(a very small nice)
boy
(sees a very nice boy)
Ungrammatical sentences:
*
(the)
boy
(the girl)
*
(small)
boy
(likes the nice girl)Slide5
Another Constituency HypothesisLexicon:
the a small nice big very boy girl sees likes
Grammatical sentences:
(the boy)
likes
(a girl)
(the small girl)
likes
(the big girl)
(a very small nice boy)
sees
(a very nice boy)
Ungrammatical sentences:
*
(the boy)
(the girl)
*
(small boy)
likes
(the nice girl)
Better: fewer
types
of constituents (blue and red are of same type)Slide6
Even More StructuresLexicon:
the a small nice big very boy girl sees likes
Grammatical sentences:
((the) boy)
likes
((a) girl)
((the) (small) girl)
likes
((the) (big) girl)
((a) ((very) small) (nice) boy)
sees
((a) ((very) nice) girl)
Ungrammatical sentences:
*
((the) boy)
((the) girl)
*
((small) boy)
likes
((the) (nice) girl)Slide7
From Substrings to Trees(
((the) boy)
likes
((a) girl))
boy
the
likes
girl
aSlide8
How do we Label the Nodes?
(
((the) boy)
likes
((a) girl) )
Choose
constituents
so each one has one
non-bracketed
word: the
head
Group words by
distribution of constituents they head
(POS)
Noun (N), verb (V), adjective (Adj), adverb (Adv), determiner (Det)
Category
of constituent:
XP, where X is POSNP, S, AdjP, AdvP, DetP Slide9
Types of Nodes(
((the/Det) boy/N)
likes/V
((a/Det) girl/N)
)
boy
the
likes
girl
a
DetP
NP
NP
DetP
S
Phrase-structure
tree
nonterminal
symbols
= constituents
terminal symbols = wordsSlide10
Determining Part-of-Speech
A
blue
seat/a
child
seat
:
noun or adjective?
Syntax:
a
blue
seat a
child
seat
a very
blue
seat *a very
child
seat this seat is blue *this seat is
childMorphology:bluer *childer
blue and child are not the same POS
blue is Adj, child is NounSlide11
Determining Part-of-Speech Preposition or particle?A
he threw out the garbage
B
he threw the garbage out the door
A
he threw the garbage out
B *
he threw the garbage the door out
The two out are not same POS
A is particle, B is PrepositionSlide12
Constituency
Some Noun phrases (NPs)
A red dog on a blue tree
A blue dog on a red tree
Some big dogs and some little dogs
A dog
I
Big dogs, little dogs, red dogs, blue dogs, yellow dogs, green dogs, black dogs, and white dogs
How do we know these form a constituent?Slide13
NP Constituency
NPs can all appear before a verb:
Some big dogs and some little dogs
are going around in cars…
Big dogs, little dogs, red dogs, blue dogs, yellow dogs, green dogs, black dogs, and white dogs
are all at a dog party!
I
do not
But individual words can’t always appear before verbs:
*
little
are going…
*
blue
are…
*
and
areMust be able to state generalizations like:
Noun phrases occur before verbsSlide14
PP ConstituencyPreposing and
postposing
:
Under a tree
is a yellow dog.
A yellow dog is
under a tree
.
But not:
*
Under
, is a yellow dog a tree.
*
Under a
is a yellow dog tree.
Prepositional phrases notable for ambiguity in attachment
I saw a man on a hill with a telescope.Slide15
Phrase Structure and Dependency Structure
likes/
V
boy/
N
girl/
N
the/
Det
a/
Det
boy
the
likes
girl
a
DetP
NP
NP
DetP
S
All nodes are labeled
with words!
Only leaf nodes labeled with words!Slide16
Phrase Structure and Dependency Structure
likes/
V
boy/
N
girl/
N
the/
Det
a/
Det
boy
the
likes
girl
a
DetP
NP
NP
DetP
S
Representationally equivalent
if each nonterminal
node has one lexical daughter (its head)Slide17
Types of Dependency
likes/
V
boy/
N
girl/
N
a/
Det
small/
Adj
the/
Det
very/
Adv
sometimes/
Adv
Obj
Subj
Adj(unct)
Fw
Fw
Adj
AdjSlide18
Grammatical RelationsTypes of relations between wordsArguments
: subject, object, indirect object, prepositional object
Adjuncts
: temporal, locative, causal, manner, …
Function WordsSlide19
SubcategorizationList of arguments of a word (typically, a verb), with features about realization (POS, perhaps case, verb form etc)In canonical order Subject-Object-IndObj
Example:
like
: N-N, N-V(to-inf)
see
: N, N-N, N-N-V(inf)
NB: J&M talk about subcategorization only within VP Slide20
VP Constituency
boy
the
likes
girl
a
DetP
NP
NP
DetP
S
boy
the
likes
DetP
NP
girl
a
NP
DetP
S
VPSlide21
VP ConstituencyExistence of VP is a linguistic (i.e., empirical) claim, not a methodological claimSyntactic evidence
VP-fronting
(
and quickly clean the carpet he did!
)
VP-ellipsis
(
He cleaned the carpet quickly, and so did she
)
Adjuncts
can occur before and after VP, but not
in
VP (
He often eats beans, *he eats often beans
)
NB: VP cannot be represented in a dependency representationSlide22
Context-Free GrammarsDefined in formal language theoryTerminals: e.g. cat
Non-terminal symbols: e.g. NP, VP
Start symbol: e.g. S
Rewriting rules: e.g. S
NP VP
Start with start symbol, rewrite using rules, done when only terminals leftSlide23
A Fragment of English
S
NP VP
VP V PP
NP DetP N
N cat | mat
V is
PP
Prep NP
Prep
on
DetP the
Input:
the cat is on the matSlide24
Derivations in a CFG
S
S
S
NP VP
VP V PP
NP DetP N
N cat | mat
V is
PP
Prep NP
Prep
on
DetP theSlide25
Derivations in a CFG
NP VP
NP
S
VP
S
NP VP
VP V PP
NP DetP N
N cat | mat
V is
PP
Prep NP
Prep
on
DetP theSlide26
Derivations in a CFG
DetP N VP
DetP
NP
S
VP
N
S
NP VP
VP V PP
NP DetP N
N cat | mat
V is
PP
Prep NP
Prep
on
DetP theSlide27
Derivations in a CFG
the cat VP
cat
the
DetP
NP
S
VP
N
S
NP VP
VP V PP
NP DetP N
N cat
| mat
V is
PP
Prep NP
Prep
on
DetP theSlide28
Derivations in a CFG
the cat V PP
cat
the
DetP
NP
PP
S
VP
N
V
S
NP VP
VP V PP
NP DetP N
N cat | mat
V is
PP
Prep NP
Prep
on
DetP theSlide29
Derivations in a CFG
the cat is Prep NP
cat
the
is
DetP
NP
PP
Prep
S
VP
N
NP
V
S
NP VP
VP V PP
NP DetP N
N cat | mat
V is
PP
Prep NP
Prep
on
DetP theSlide30
Derivations in a CFG
the cat is on Det N
cat
the
is
DetP
NP
DetP
PP
Prep
S
VP
N
NP
V
S
NP VP
VP V PP
NP DetP N
N cat | mat
V is
PP
Prep NP
Prep
on
DetP the
on
NSlide31
Derivations in a CFG
the cat is on the mat
cat
the
is
DetP
NP
DetP
PP
Prep
S
VP
N
NP
V
S
NP VP
VP V PP
NP DetP N
N
cat |
mat
V is
PP
Prep NP
Prep
on
DetP the
on
N
the
matSlide32
A More Complicated Fragment of EnglishS NP VP
S
VP
VP V PP
VP
V NP
VP V
NP DetP NP
NP
N NP
NP
N
PP
Prep NP
N cat | mat |
food | bowl | Mary
V is |
likes | sits
Prep on |
in | underDetP the | a
Mary likes the cat bowl
.
The cat ate the tasty food.
Hello. Nice talking to you.Slide33
Pocket Sphinx Grammar FormatVariables go in angle brackets, e.g. <city>
Terminals
must appear in your pronunciation dictionary (case sensitive
)
X
Y is concatenation
-- e.g
., I
WANT
(
X | Y) means X or Y
--
e.g., (WANT|
NEED)
Square
brackets mean optional -- e.g., [ON] FRIDAY
* means that the expansion may be spoken zero or more times -- e.g. <digit>
*+ means one or more times -- e.g. <digit>+ Slide34
Example<city> = BOSTON | NEWYORK | WASHINGTON | BALTIMORE; <time> = MORNING | EVENING;
<
day> = FRIDAY | MONDAY;
public
<query> = (((WHAT TRAINS LEAVE) | (WHAT TIME CAN I TRAVEL) | (IS THERE A TRAIN)) (FROM|TO) <city> [(FROM|TO) <city>] ON <day> [<time>]);
Hello. No. I want to go on Tuesday. When does the train leave?Slide35
Next ClassLanguage modeling for large vocabulary applications: Ngrams