/
Parsing Giuseppe Attardi Parsing Giuseppe Attardi

Parsing Giuseppe Attardi - PowerPoint Presentation

liane-varnes
liane-varnes . @liane-varnes
Follow
342 views
Uploaded On 2019-11-27

Parsing Giuseppe Attardi - PPT Presentation

Parsing Giuseppe Attardi Dipartimento di Informatica Università di Pisa Università di Pisa Question Answering at TREC Consists of answering a set of 500 factbased questions eg When was Mozart born ID: 768259

parsing dependency parser arc dependency parsing arc parser based tree graph prep head shift word issue perform conj input

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Parsing Giuseppe Attardi" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Parsing Giuseppe AttardiDipartimento di InformaticaUniversità di Pisa Università di Pisa

Question Answering at TREC Consists of answering a set of 500 fact-based questions, e.g. “When was Mozart born?”Systems were allowed to return 5 ranked answer snippets to each question.IR thinkMean Reciprocal Rank (MRR) scoring: 1, 0.5, 0.33, 0.25, 0.2, 0 for 1, 2, 3, 4, 5, 6+ docMainly Named Entity answers (person, place, date, …) From 2002 systems are only allowed to return a single exact answer

TREC 2000 Results (long) Watson Team

Falcon The Falcon system from SMU was by far best performing system at TREC 2000It used NLP and performed deep semantic processing

Question parse Who was the first Russian astronaut to walk in space WP VBD DT JJ NNP NP TO VB IN NN NP NP PP VP S VP S

Question semantic form astronaut walk space Russian first PERSON first(x)  astronaut(x)  Russian(x)  space(z)  walk(y, z, x)  PERSON(x) Question logic form: Answer type

Parsing in QA Top systems in TREC 2005 perform parsing of queries and answer paragraphsSome use specially built parserParsers are slow: ~ 1min/sentence

Practical Uses of Parsing Google Knowledge Graph enriched from relations extracted from Dependency TreesGoogle index parses all documentsGoogle Translator applies dependency parsing to sentencesSentiment Analysis improves by dependency parsing

Statistical Methods in NLP Some NLP problems: Information extractionNamed entities, Relationships between entities, etc. Finding linguistic structurePart-of-speech tagging, Chunking, Parsing Can be cast as learning mapping:Strings to hidden state sequences NE extraction, POS tagging Strings to strings Machine translation Strings to trees Parsing Strings to relational data structures Information extraction

Techniques Log-linear (Maximum Entropy) taggersProbabilistic context-free grammars (PCFGs)Discriminative methods:Conditional MRFs, Perceptron, Kernel methods

Learning mapping Strings to hidden state sequencesNE extraction, POS taggingStrings to stringsMachine translationStrings to treesParsing Strings to relational data structuresInformation extraction

POS as Tagging INPUT:Profits soared at Boeing Co., easily topping forecasts on Wall Street.OUTPUT:Profits /N soared/V at/P Boeing/N Co./N , /, easily /ADV topping /V forecasts /N on /P Wall /N Street/N ./.

NE as Tagging INPUT:Profits soared at Boeing Co., easily topping forecasts on Wall Street.OUTPUT:Profits /O soared/O at/O Boeing/BC Co./IC , /O easily /O topping /O forecasts /O on/NA Wall /BL Street /IL ./O

Parsing Technology

Constituent Parsing

Constituent Parsing Requires Phrase Structure GrammarCFG, PCFG, Unification GrammarProduces phrase structure parse tree Rolls-Royce Inc. said it expects its sales to remain steady ADJP VP NP S VP S NP VP NP VP

Statistical Parsers Probabilistic Generative Model of Language which include parse structure (e.g. Collins 1997)Learning consists in estimating the parameters of the model with simple likelihood based techniquesConditional parsing models (Charniak 2000; McDonald 2005)

Results Method Accuracy PCFGs (Charniak 97) 73.0% Conditional Models – Decision Trees ( Magerman 95) 84.2% Lexical Dependencies (Collins 96) 85.5% Conditional Models – Logistic (Ratnaparkhi 97) 86.9% Generative Lexicalized Model (Charniak 97) 86.7% Generative Lexicalized Model (Collins 97) 88.2% Logistic-inspired Model (Charniak 99) 89.6% Boosting (Collins 2000) 89.8%

Linear Models for Parsing and Tagging Three components:GEN is a function from a string to a set of candidatesF maps a candidate to a feature vector W is a parameter vector

Component 1: GEN GEN enumerates a set of candidates for a sentenceShe announced a program to promote safety in trucks and vans GEN

Examples of GEN A context-free grammarA finite-state machineTop N most probable analyses from a probabilistic grammar

Component 2: F F maps a candidate to a feature vector Rd F defines the representation of a candidate <1, 0, 2, 0, 0, 15, 5> F

Feature A “feature” is a function on a structure, e.g.,h(x) = Number of times is seen in x Feature vector: A set of functions h 1 … h d define a feature vector F ( x ) = <h 1(x), h2(x) … hd(x)> A B C

Component 3: WW is a parameter vector Rd F • W map a candidate to a real-valued score

Putting it all together X is set of sentences, Y is set of possible outputs (e.g. trees) Need to learn a function F : X → Y GEN, F , W define Choose the highest scoring tree as the most plausible structure  

Dependency Parsing

Dependency Parsing Produces dependency treesWord-word dependency relationsEasier to understand and to annotate than constituent trees Rolls-Royce Inc. said it expects its sales to remain steady SUBJ OBJ MOD SUBJ OBJ SUBJ MOD TO

Data-Driven Dependency Parsing Graph BasedConsider possible dependency graphsDefine score and select graph with highest scoreTransition BasedDefine a transition system that leads to a parse tree while analyzing a sentence one word at a time

Transition-based Shift-Reduce Parsing Left He PP saw VVD a DT girl NN with IN a DT telescope NN . SENT next top Shift Right

Interactive Simulator Play with it at:http://medialab.di.unipi.it/Project/QA/Parser/sim.html

Shift/Reduce Dependency Parser Traditional statistical parsers are trained directly on the task of tagging a sentenceInstead an SR Parser is trained and learns the sequence of parse actions required to build the parse tree

Grammar Not Required A traditional parser requires a grammar for generating candidate treesAn inductive parser needs no grammar

Parsing as Classification Inductive dependency parsingParsing based on Shift/Reduce actionsLearn from annotated corpus which action to perform at each step

Dependency Graph Let R = {r1, … , rm} be the set of permissible dependency types A dependency graph for a string of words W = w 1 … w n is a labeled directed graph D = ( W, A), where (a) W is the set of nodes, i.e. word tokens in the input string,(b) A is a set of labeled arcs (wi, wj, r),wi, wj  W, r  R,(c)  wj  W, there is at most one arc(wi, wj, r)  A.

Parser State The parser state is a quadruple S, B, A , whereS is a stack of partially processed tokens B is a buffer of (remaining) input tokens A is the arc relation for the dependency graph ( h , d , r )  A represents an arc h −r→ d, tagged with dependency r

Transition Systems Set of rules used by a transition parser:Nivre’s Arc StandardNivre’s Arc eagerAttardi’s non projectiveNivre’s swap

Arc Standard Shift  S , n | B , A   S | n , B, ALeft-arcr S|s, n|B, A S, n|B, A {(n, s, r)}Right-arcr S|s, n|B, AS, s|B, A{(s, n, r)}

Parser Algorithm The parsing algorithm is fully deterministic and works as follows:Input Sentence: (w 1, w2 , … , wn) S = <> B = < w 1 , w 2, … , wn> while B != <> do x = getContext(S, B) y = selectAction(model, x) performAction (y, S, B)

Oracle An algorithm that given the gold tree for a sentence, produces a sequence of actions that a parser may use to obtain that gold tree from the input sentenceUsed during training

Arc Standard Oracle Emulate the parserInput Sentence: (w1, w2, … , wn) S = <> B = < w 1 , w 2 , … , w n > while B != <> do if B[0] −r → S[0] and all children of S[0] are attached perform Left-Arcr else if S[0] −r→ B[0] and all children of B[0] are attached perform Right-Arcr else perform Shift

Projectivity An arc wi→wk is projective iff j, i < j < k or i > j > k, w i →* wjA dependency tree is projective iff every arc is projectiveIntuitively: arcs can be drawn on a plane without intersections

Non Projectivity I saw a girl yesterday wearing a ring

Non Projectivity Většinu těchto přístrojů lze take používat nejen jako fax , ale Addressed by special actions: Right 2 , Left 2 Right 3 , Left 3

Arc Standard Properties Does not deal with non-projectivityEvery transition sequence produces a projective dependency tree (soundness)Every projective tree is produced by some transition sequence (completeness)Parsing n words requires 2n transitions

Arc Eager Transitions Shift  S , n | B , A   S | n , B, A Left-arcrS|s, n |B, Ano (k, s, l) A S , n|B, A{(n, s, r)} Right-arcr S|s, n|B, A S|s|n, B, A {(s, n, r)} ReduceS| s, B, Aif (k, s, r) A S, B, A

Arc Eager Oracle Emulate the parserInput Sentence: (w1, w2 , … , wn) S = <> B = < w 1 , w 2 , … , w n> while B != <> do if B[0] r→ S[0] perform Left-Arcr else if S[0] r→ B[0] perform Right-Arcr else if all children of S[0] are attached and S[0] is attached perform Reduce else perform Shift

Arc Eager Parsing Action Stack Buffer [] They told him a story Shift They told him a story LA-subj told him a story Shift told him a story RA-obj told hima storyReducetold a storyShifttold astoryLA-dettoldstoryRA-objtold storyReducetold

Non Projective Transitions

Actions for non-projective arcs (Attardi) Right2 r  S | s 2 | s 1 , n | B , A S|s1, n|B, A{(s2, r, n)} Left2rS|s2|s 1, n|B, A S|s2, s1|B, A{( n, r, s2)}Right3r S|s3|s2|s1 , n|B, A S|s2|s1, n|B, A {(s3, r, n)} Left3rS|s3|s2|s 1, n|B, A S|s3|s2, s1|B, A {(n, r, s3)}

Example Right2 (nejen ← ale)Left3 (Většinu → fax) Většinu těchto přístrojů lze take používat nejen jako fax , ale

Examples zou gemaakt moeten worden in zou moeten worden gemaakt in Extract followed by Insert

Example Right2 (nejen → ale) Left3 (fax → Většinu) Většinu těchto přístrojů lze take používat nejen jako fax , ale

Example Většinu lze používat nejen fax ale jako , těchto přístrojů take Right2

Example Většinu lze používat fax ale jako , těchto přístrojů take nejen Left3

Example Většinu lze používat jako těchto přístrojů take nejen ale , fax

Effectiveness for Non-Projectivity Training data for Czech contains 28081 non-projective relations26346 (93%) can be handled by Left2/Right21683 (6%) by Left3/Right3

Non-Projective Accuracy language total DeSR MaltParser Czech 104 77 79 Slovene 88 34 21 Portuguese 54 26 24 Danish 35 10 9

Alternative: swap Shift  S , n | B , A   S | n , B, A Left-arcrS| s, n|B, Ano ( k, s, l) A S, n|B, A{(n, s, r)}Right-arcr S|s, n|B, A S, s |I, A{(s, n, r)} Swap S|s, n|B, A s < n S|n, s|I, A

Arc Standard Swap Oracle Emulate the parserInput Sentence: (w1, w 2, … , wn) S = <> B = < w 1 , w 2 , … , w n> while B != <> do if B[0] −r→ S[0] and all children of S[0] are attached perform Left-Arcr else if S[0] −r→ B[0] and all children of B[0] are attached perform Right-Arcr else if S[0] < B[0] according to Inorder traversal perform Swap else perform Shift

Example Inorder traversal:A hearing on the issue is scheduled today 1 2 5 6 7 3 4 8

Parsing with Swap Action Stack Buffer [] A hearing is scheduled on the issue today Shift A hearing is scheduled on the issue today Shift A hearing is scheduled on the issue today LA-ATT hearing Is scheduled on the issue today Shifthearing isscheduled on the issue today Shifthearing is scheduledon the issue todayShifthearing is scheduled onthe issue todaySwaphearing is onscheduled the issue todaySwaphearing on Is scheduled the issue todayShifthearing on isscheduled the issue todayShifthearing on is scheduledthe issue todayShifthearing on is scheduled theIssue todaySwap hearing on is thescheduled issue todaySwaphearing on theIs scheduled issue today

Learning Phase

Features Feature ID Value F form of token L lemma of token P part of speech (POS) tag M morphology /F form of the leftmost child node /L lemma of the leftmost child node /P POS tag of the leftmost child node, if present M\ Morphology of the rightmost child node F\ form of the rightmost child node L\ lemma of the rightmost child node P\POS tag of the rightmost child node, if present M\Morphology of the rightmost child node

Learning Event leggi NOM le DET anti ADV che PRO , PUN Serbia NOM erano VER discusse ADJ che PRO Sosteneva VER context left context target nodes right context (-3, F, che ), (-3, P, PRO), (-2, F, leggi ), (-2, P, NOM), (-2, M, P), (-2, /F, le), (-2, /P, DET), (-2, /M, P), (-1, F, anti), (-1, P, ADV), (0, F, Serbia), (0, P, NOM), (0, M, S), (+1, F, che), ( +1, P, PRO), (+1, F\, erano), (+1, P\, VER), (+1, M\, P), (+2, F, ,), (+2, P, PUN)

DeSR (Dependency Shift Reduce)Multilanguage statistical transition based dependency parserLinear algorithmCapable of handling non-projectivityTrained on 28 languagesAvailable from: http://desr.sourceforge.net/

Parser Architecture Modular learners architecture:MLP, MaxEntropy, SVM, PerceptronFeatures can be configured

Available Classifiers Maximum EntropyFast, not very accurateSVMSlow, very accurateMultilayer Perceptron Fast, very accurateDeep LearningWord embeddings as features

- d Update Input Layer Output Layer Destinations Perceptron: Activation functions: Learning: The simplest ANN: Perceptron Slide by Geoffrey Hinton D 0 Input Layer Output Layer Destinations D 1 Input Layer Output Layer Destinations D 2

Multilayer Perceptron input vector hidden layers outputs Back-propagate error signal to get derivatives for learning Compare outputs with correct answer to get error signal Slide by Geoffrey Hinton

Feature Model LEMMA -2 -1 0 1 2 3 prev(0) leftChild(-1) leftChild(0) rightChild(-1) rightChild(0) POSTAG -2 -1 0 1 2 3 next(-1) leftChild(-1) leftChild(0) rightChild(-1) rightChild(0) CPOSTAG -1 0 1 FEATS -1 0 1 DEPREL leftChild (-1) leftChild (0) rightChild (-1) Notation used in DeSR configuration files

2nd, 3rd Order Features 2nd CPOSTAG(-1) CPOSTAG(0) 2nd CPOSTAG(0) CPOSTAG(1) 2nd LEMMA(0) POSTAG(leftChild(0)) 3rd POSTAG( leftChild (0)) LEMMA(0) POSTAG( rightChild (0))

CoNLL-X Shared Task To assign labeled dependency structures for a range of languages by means of a fully automatic dependency parserInput: tokenized and tagged sentencesTags: token, lemma, POS, morpho features, ref. to head, dependency labelFor each token, the parser must output its head and the corresponding dependency relation

CoNLL-X: Data Format N WORD LEMMA CPOS POS FEATS HEAD DEPREL PHEAD PDEPREL 1 A o art art < artd >|F|S 2 >N _ _ 2 direcção direcção n n F|S 4 SUBJ _ _ 3 já já adv adv _ 4 ADVL _ _ 4 mostrou mostrar v v-fin PS|3S|IND 0 STA _ _5 boa_vontade boa_vontade n n F|S 4 ACC _ _6 , , punc punc _ 4 PUNC _ _ 7 mas mas conj conj-c <co-vfin>|<co-fmc> 4 CO _ _8 a o art art <artd>|F|S 9 >N _ _9 greve greve n n F|S 10 SUBJ _ _10 prossegue prosseguir v v-fin PR|3S|IND 4 CJT _ _ 11 em em prp prp _ 10 ADVL _ _12 todas_as todo_o pron pron-det <quant>|F|P 13 >N _ _13 delegações delegaçõo n n F|P 11 P< _ _ 14 de de prp prp <sam-> 13 N< _ _15 o o art art <-sam>|<artd>|M|S 16 >N _ _16 país país n n M|S 14 P< _ _17 . . punc punc _ 4 PUNC _ _

CoNLL: Evaluation Metrics Unlabeled Attachment Score (UAS)proportion of tokens that are assigned the correct headLabeled Attachment Score (LAS) proportion of tokens that are assigned both the correct head and the correct dependency relation label Context Word Labeled Attachment Score (CLAS) Like LAS but disregards attachments of punctuations and function words, i.e. determiners ( det ), classifiers ( clf ), adpositions (case), auxiliaries (aux, cop), and conjunctions (cc, mark)

Annotation Issues A dependency graph D = (W, A) is a directed rooted treeD is (weakly) connected:If i , j ∈ V, i ↔ ∗ j D is acyclic : If i → j, then not j →∗ i D obeys the single-head constraint:If i → j, then not i′ → j, for any i′ ≠ iThe single-head constraint causes problems in handling certain linguistic phenomena

Anomalous Cases Il governo garantirà sussidi a coloro che cercheranno lavoro He designs and develops programs

Solution Il governo garantirà sussidi a coloro che cercheranno lavoro He designs and develops programs PREDREL SUBJ SUBJ OBJ OBJ

Evalita 2009 Results Corpus DeSR Best Turin TreeBank 88.67 88.73 ISST 83.38 83.38 Evaluation of Italian linguistic tools for parsing, POS tagging, NER tagging

Evalita 2014 Results Metric LAS UAS Parser accuracy 87.89 90.16 Metric Prec. Rec. F1 Relation Accuracy 81.89 90.45 85.95

Problem with Oracles Only suggest correct pathIf a parser makes mistakes, it finds itself in a state never seen in training and does not know how to recoverThis causes error propagation

Spurious ambiguities Two possible parsing sequences:SH LA RA SH RA SH LA RE RA RE RASH LA RA SH RA RE SH LA RA RE RA

Error Propagation Standard Oracle:SH LA RA SH SH LA SH SH Errors: 3 Dynamic Oracle: SH LA RA SH LA LA RA RE RA Errors: 1

Dynamic Oracle Allows more than one transition sequenceMakes optimal predictions in all configurationsi.e. does not introduce any further errorsSee: Y, Goldberg, J. Nivre. 2012. A Dynamic Oracle for Arc-Eager Dependency Parsing. Coling 2012. www.aclweb.org/anthology/C12-1059

Dependency Parser using Neural Networks Chen & Manning. A fast and accurate dependency parser using NN. EMNLP 2014.

State Representation Extract a set of tokens from Stack/Buffer Concatenate their vector embeddings Word POS dep S1 S2 B1 lc (S1) rc (S1) lc (S2) rc(S2)goodhascontrol00He0JJVBZNN00PRP 000000nsubj0Embeddings express similaritiesPOS: NN similar to NNSdeps: amod similar num

Accuracy UAS Parser Penn TB Chinese TB Sent/sec Standard 89.9 82.7 Malt 90.1 82.4 470 MST 92.0 83.0 10 NN92.083.9650

Further Developments Improvements:Bigger, deeper networks with better tuned hyperparametersBeam searchBidirectional LSTM

Graph-based Dependency Parsing

Graph-based Parsing For input sentence x define a graph Gx = (V x, Ax), whereV x = {0, 1, …, n}Ax = {(i, j, k) | i, j  V and k  L} Key observation: Valid dependency trees for x = directed spanning trees of G x Score of dependency tree T by the score of its arcs: Learning: Scoring function s(i, j, k) for each arc (i, j, k)Inference:Search for maximum spanning tree T of Gx given s() 

Finding Maximum Spanning Tree Basic idea: choose the arc with highest score from each node Not a tree!

Chu-Liu-Edmonds If not a tree, identify cycle and contractRecalculate arc weights into and out-of cycle O(n2) complexity for non-projective trees

Neural Network Graph Parser Biaffine Attention Model (Dozat&Manning)http://aclweb.org/anthology/K18-2016

NN graph based parser Revived graph-based dependency parsing in a neural worldDesign a biaffine scoring model for neural dependency parsingUses a neural sequence modelGreat results! But slower than simple neural transition-based parsersThere are n2 possible dependencies in a sentence of length n Method UAS LAS (PTB 3.3) Chen & Manning 2014 92.0 89.7 Weiss et al. 2015 93.99 92.05 Andor et al. 2016 94.6192.79Dozat & Manning 201795.7494.08

Architecture Overview Bidirectional LSTM over word/tag embeddings Word POS

Parser Deps Arcs Two separate FC (Fully Connected) ReLU layers: One representing each token as a dependent trying to determine its label One representing each token as a head trying to determine its dependents' labels Two separate FC ReLU layers One representing each token as a dependent trying to find (attend to) its head One representing each token as a head trying to find (be attended to by) its dependents

Self Attention Biaffine self-attention layer to score possible heads for each dependent (n  n) matrix score s i = H ( arc-head ) ( W h i ( arc-dep) + b)H(arc-head) = [h1(arc-head)|…| hn(arc-head)]Train with cross-entropyApply a spanning tree algorithm at inference time

Biaffine layer to score possible relations for each best-head/dependent pair ( n  c ) Train with softmax cross-entropy, added to the loss of the unlabeled parser Note : this is just a linear model with interaction effects! label.scores ~ head.state * dep.stateClassifier for Labels

Universal Dependencies SoTA 2017 TreeBank LAS CLAS System ru 92.60 90.11 Stanford hi 91.59 87.92 Stanford sl 91.51 88.98 Stanfordpt91.3687.48Stanford ja91.1383.18TRLca90.7086.70Stanford it90.6886.18Stanfordcs90.4388.31 Stanfordpl90.3287.94Stanfordcs90.17 88.44Stanford

Dependencies encode relational structure Relation Extraction with Universal Dependencies

Dependency paths identify relations Example: protein interaction [Erkan et al. EMNLP 07, Fundel et al. 2007] KaiC ← nsubj  interacts  prep_with → SasA KaiC ←nsubj interacts prep_with→ SasA  conj_and→ KaiAKaiC ←nsubj interacts prep_with→ SasA conj_and→ KaiB demonstrated results KaiC interacts rythmically nsubj The compl det ccomp that nsubj KaiB KaiA SasA conj_and conj_and advmod prep_with slide by C. Manning

Universal Dependencies [de Marneffe et al. LREC 2006] Joint international effort to standardize dependency annotation for many [>70] languages universaldependencies.org jumped boy over the the little prep nsubj det amod pobj fence det slide by C. Manning

Graph modification to facilitate semantic analysis Bell, based in LA, makes and distributes electronic and computer products. makes and nsubj dobj products computer conj cc and electronic amod Bell in prep partmod based pobj LA cc conj distributes slide by C. Manning

Triple notation nsubj(makes-8, Bell-1)nsubj(distributes-10, Bell-1) partmod(Bell-1, based-3)nn (Angeles-6, Los-5)prep in(based-3, Angeles-6) conj and(makes-8, distributes-10) amod (products-16, electronic-11) conj and(electronic-11, computer-13) amod (products-16, computer-13) conj and(electronic-11, building-15) amod (products-16, building-15)dobj(makes-8, products-16)

Graph modification to facilitate semantic analysis Bell, based in LA, makes and distributes electronic and computer products. makes nsubj dobj products computer conj_and electronic amod Bell prep_in partmod based LA conj_and distributes amod nsubj slide by C. Manning

BioNLP Relation extraction shared tasks [Björne et al. 2009] slide by C. Manning

Input: Universal Dependencies slide by H. Poon involvement up-regulation IL-10 human monocyte prep_in nn prep_by gp41 p70(S6)-kinase activation prep_in prep_of nn Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocyte by gp41 …

Joint Predictions up-regulation IL-10 prep_in nn prep_by gp41 prep_in prep_of nn involvement Trigger word? Event type? Trigger word? Event type? activation Trigger word? Event type? Trigger word? Event type? human monocyte Trigger word? Event type? Trigger word? Event type? p70(S6)-kinase Trigger word? Event type? slide by H. Poon

References G. Attardi, Experiments with a Multilanguage Non-Projective Dependency Parser, Proc. of the Tenth Conference on Natural Language Learning, New York, (NY), 2006. G. Attardi, F. Dell'Orletta. Reverse Revision and Linear Tree Combination for Dependency Parsing. Proc. of NAACL HLT 2009, 2009.G. Attardi, F. Dell'Orletta , M. Simi, J. Turian . Accurate Dependency Parsing with a Stacked Multilayer Perceptron . Proc. of Workshop Evalita 2009 , ISBN 978-88-903581-1-1, 2009. H. Yamada, Y. Matsumoto. 2003. Statistical Dependency Analysis with Support Vector Machines. In Proc. IWPT.M. T. Kromann . 2001. Optimality parsing and local cost functions in discontinuous grammars. In Proc. FG-MOL.

References D. Cer, M. de Marneffe, D. Jurafsky and C. Manning, Parsing to Stanford Dependencies: Trade-offs between speed and accuracy, In Proc. of LREC-10. 2010