/
Data-Driven Dependency Parsing Data-Driven Dependency Parsing

Data-Driven Dependency Parsing - PowerPoint Presentation

yoshiko-marsland
yoshiko-marsland . @yoshiko-marsland
Follow
344 views
Uploaded On 2019-11-09

Data-Driven Dependency Parsing - PPT Presentation

DataDriven Dependency Parsing Kenji Sagae CSCI544 Background Natural Language Parsing Syntactic analysis String to tree structure He likes fish PARSER S NP VP NP V N Prn He likes fish ID: 764849

pos stack sandwich ate stack pos ate sandwich queue likes parser fish reduce shift dependency det tree boy input

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Data-Driven Dependency Parsing" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Data-Driven Dependency Parsing Kenji Sagae CSCI-544

Background: Natural Language ParsingSyntactic analysisString to (tree) structure He likes fish PARSER S NP VP NP V N Prn He likes fish Input Output

He likes fish PARSER S NP VP NP V N Prn He likes fish

He likes fish PARSER S NP VP NP V N Prn He likes fish Useful in Natural Language Understanding NL interfaces, conversational agents Language technology applications Machine translation, question answering, information extraction Scientific study of language Syntax Language processing models

He likes fish PARSER S NP VP NP V N Prn He likes fish S → NP VP NP → N NP → NP PP VP → V NP VP → V NP PP VP → VP PP … GRAMMAR Not enough coverage, Too much ambiguity

6 He likes fishPARSER S NP VP NP V N Prn He likes fish S → NP VP NP → N NP → NP PP VP → V NP VP → V NP PP VP → VP PP … GRAMMAR S NP VP AdvP V Adv Det The runs fast N boy S NP VP AdvP V Adv N Dogs run fast S NP VP V N Dogs run S NP VP AdvP V Adv Det The runs fast N boy S NP VP AdvP V Adv N Dogs run fast S NP VP V N Dogs run TREEBANK Charniak (1996); Collins (1996); Charniak (1997)

He likes fish PARSER S NP VP NP V N Prn He likes fish S → NP VP NP → N NP → NP PP VP → V NP VP → V NP PP VP → VP PP … GRAMMAR S NP VP AdvP V Adv Det The runs fast N boy S NP VP AdvP V Adv N Dogs run fast S NP VP V N Dogs run S NP VP AdvP V Adv Det The runs fast N boy S NP VP AdvP V Adv N Dogs run fast S NP VP V N Dogs run TREEBANK

S NP VP NP V N Det The ate the Det N boy cheese sandwich N The ate the boy cheese sandwich Phrase Structure Tree (Constituent Structure) Dependency Structure

S NP VP NP V N Det The ate the Det N boy cheese sandwich N The ate the boy cheese sandwich sandwich ate ate boy

The ate the boy cheese sandwich The boy cheese t he ate sandwich SUBJ DET DET OBJ MOD DET SUBJ OBJ DET MOD HEAD DEPENDENT LABEL

Background: Linear Classification with the PerceptronClassification: given an input x predict output yExample: x is a document, y ∈ {Sports, Politics, Science}x is represented as a feature vector f(x)Example: x f(x) y Just add feature weights given in a vector w Wednesday night, when the Lakers play the Mavericks at American Airlines Center, they get to see first hand … # games: 5 # Lakers: 4 # said: 3 # rebounds: 3 # democrat: 0# republican: 0# science: 0 Sports

Multiclass PerceptronLearn vectors of feature weights wclass for each class c wc = 0For N iterations For each training example (xi, y i ) z i = argmaxz wz • f(xi) if zi ≠ yi wzi = wzi – f(xi) wyi = wyi + f(xi )Try to classify each example. If a mistake is made, update the weights.

Shift-Reduce Dependency ParsingTwo main data structures Stack S (initially empty) Queue Q (initialized to contain each word in the input sentence)Two types of actions Shift: removes a word from Q, pushes onto S Reduce: pops two items from S, pushes a new item onto SNew item is a tree that contains the two popped itemsThis can be applied to either dependencies ( Nivre , 2004) or constituents ( Sagae & Lavie , 2005)

Shift Under a proposal… to Stack Input string Stack Input string Before SHIFT After SHIFT SHIFT expand IRAs a a shift action removes the next token from the input list… … and pushes this new item onto the stack PMOD Under a proposal… to PMOD expand IRAs a

Reduce Stack Input Stack Input Before REDUCE After REDUCE REDUCE-RIGHT-VMOD a reduce action pops these two items… … and pushes this new item Under a proposal… to expand PMOD Under a proposal…to expand PMOD VMOD IRAs a $2000 IRAs a $2000

STACKQUEUE Helikesfish SHIFT REDUCE-RIGHT-SUBJ SHIFT SHIFT REDUCE-LEFT-OBJ SUBJ OBJ He likes fish SUBJ He likes Parser Action:

Choosing Parser ActionsNo grammar, no action tableLearn to associate stack/queue configurations with appropriate parser actionsClassifierTreated as a black-box Perceptron, SVM, maximum entropy, memory-based learning, etcFeatures: top two items on the stack, next input token, context, lookahead, …Classes: parser actions

STACKQUEUE Helikesfish Features: stack(0) = likes stack(0).POS = VBZ stack(1) = He stack(1).POS = PRP stack(2) = 0 stack(2).POS = 0 queue(0) = fish queue(0).POS = NN queue(1) = 0 queue(1).POS = 0 queue(2) = 0 queue(2).POS = 0

STACKQUEUE Helikesfish Features: stack(0) = likes stack(0).POS = VBZ stack(1) = He stack(1).POS = PRP stack(2) = 0 stack(2).POS = 0 queue(0) = fish queue(0).POS = NN queue(1) = 0 queue(1).POS = 0 queue(2) = 0 queue(2).POS = 0 Class: Reduce-Right-SUBJ

STACKQUEUE fish Features:stack(0) = likes stack(0).POS = VBZstack(1) = He stack(1).POS = PRPstack(2) = 0 stack(2).POS = 0queue(0) = fish queue(0).POS = NNqueue(1) = 0 queue(1).POS = 0queue(2) = 0 queue(2).POS = 0 Class: Reduce-Right-SUBJ He likes

STACKQUEUE fish Features:stack(0) = likes stack(0).POS = VBZstack(1) = He stack(1).POS = PRPstack(2) = 0 stack(2).POS = 0queue(0) = fish queue(0).POS = NNqueue(1) = 0 queue(1).POS = 0queue(2) = 0 queue(2).POS = 0 Class: Reduce-Right-SUBJ He likes

STACKQUEUE fish Features:stack(0) = likes stack(0).POS = VBZstack(1) = He stack(1).POS = PRPstack(2) = 0 stack(2).POS = 0queue(0) = fish queue(0).POS = NNqueue(1) = 0 queue(1).POS = 0queue(2) = 0 queue(2).POS = 0 Class: Reduce-Right-SUBJ SUBJ He likes

Accurate Parsing with Greedy SearchExperiments: WSJ Penn Treebank1M words of WSJ textAccuracy: ~90% (unlabeled dependency links)Other languages ( CoNLL 2006, 2007 shared tasks)Arabic, Basque, Chinese, Czech, Japanese, Greek, Hungarian, Turkish, …about 75% to 92%Good accuracy, fast (linear time), easy to implement!

Maximum Spanning Tree Parsing(McDonald et al., 2005)Dependency tree is a graph (obviously)Words are vertices, dependency links are edgesImagine instead a fully connected weighted graph Each weight is the score for the dependency linkEach scores is independent of other dependenciesEdge-factored modelFind the Maximum Spanning TreeScore for the tree is the sum of the scores of its individual dependenciesHow are edge weights determined?

1 (I)2 (ate) 3 (a)4 (sandwich)0 (root) I ate a sandwich 1 2 3 4

1 (I)2 (ate) 3 (a)4 (sandwich)0 (root) I ate a sandwich 1 2 3 4 -11 2 12 -8 8 1 -2 0 5 7 -2 9 3 3 -3 3 9 5 1 3

1 (I)2 (ate) 3 (a)4 (sandwich)0 (root) I ate a sandwich 1 2 3 4 -11 2 12 -8 8 1 -2 0 5 7 -2 9 3 3 -3 3 -1 5 1 3

Structured Classificationx is a sentence, G is a dependency tree, f(G) is a vector of features for the entire treeFeatures: h(ate):d(sandwich) hPOS(VBD):dPOS(NN) h(ate):d(I) hPOS(VBD):dPOS(PRP) h(sandwich):d(a) hPOS(NN):dPOS(DT) hPOS(VBD) hPOS(NN) dPOS(NN) dPOS(DT) dPOS(NN) dPOS (PRP) h(ate) h(sandwich) d(sandwich) … (many more) To assign edge weights, we learn a feature weight vector w

Structured PerceptronLearn a vector of feature weights ww = 0For N iterations For each training example (xi, Gi) G’i = argmax G ’ ∈ GEN (x i) w• f(G’) if G’i ≠ Gi w = w + f(Gi) – f(G’i)The same as before, but to find the argmax we use MST, since each G is a tree (which also contains the corresponding input x). If G’i is not the right tree, update the feature vector

Question: Are there trees that an MST parser can find, but a Shift-Reduce parser* can’t?(*shift-reduce parser as described in slides 13-19)

Accurate Parsing with Edge-Factored ModelsThe Maximum Spanning Tree algorithm for directed trees (Chu & Liu, 1965; Edmonds, 1967) runs in quadratic timeFinds the best out of exponentially many treesExact inference!Edge-factored: each dependency link is considered independently from the others Compare to Shift-Reduce parsingGreedy inferenceRich set of features includes partially built treesMcDonald and Nivre (2007) show that shift-reduce and MST parsing get similar accuracy, but have different strengths

Parser EnsemblesBy using different types of classifiers and algorithms, we get several different parsersEnsemble idea: combine the output of several parsers to obtain a single more accurate result I like cheeseParser A Parser B Parser C I like cheese I like cheese I like cheese I like cheese

Parser Ensembles with Maximum Spanning Trees(Sagae and Lavie, 2006) First, build a graphCreate a node for each word in the input sentence (plus one extra “root” node)Each dependency proposed by any of the parsers is an weighted edgeIf multiple parsers propose the same dependency, add weight to the corresponding edgeThen, simply find the MSTMaximizes the votesStructure guaranteed to be a dependency tree

1 (I)2 (ate) 3 (a)4 (sandwich)0 (root) I ate a sandwich 1 2 3 4

1 (I)2 (ate) 3 (a)4 (sandwich)0 (root) I ate a sandwich 1 2 3 4

1 (I)2 (ate) 3 (a)4 (sandwich)0 (root) I ate a sandwich 1 2 3 4 Parser A Parser B Parser C

1 (I)2 (ate) 3 (a)4 (sandwich)0 (root) I ate a sandwich 1 2 3 4

1 (I)2 (ate) 3 (a)4 (sandwich)0 (root) I ate a sandwich 1 2 3 4

MST Parser Ensembles Are Very AccurateHighest accuracy in CoNLL 2007 shared task on multilingual dependency parsing (a parser bake-off with 22 teams)Nilson et al. (2007); Sagae and Tsujii (2007)Improvement depends on selection of parsers for the ensembleWith four parsers with accuracy between 89 and 91, ensemble accuracy = 92.7