Parsing Giuseppe Attardi Dipartimento di Informatica Università di Pisa Università di Pisa Question Answering at TREC Consists of answering a set of 500 factbased questions eg When was Mozart born ID: 768259
Download Presentation The PPT/PDF document "Parsing Giuseppe Attardi" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Parsing Giuseppe AttardiDipartimento di InformaticaUniversità di Pisa Università di Pisa
Question Answering at TREC Consists of answering a set of 500 fact-based questions, e.g. “When was Mozart born?”Systems were allowed to return 5 ranked answer snippets to each question.IR thinkMean Reciprocal Rank (MRR) scoring: 1, 0.5, 0.33, 0.25, 0.2, 0 for 1, 2, 3, 4, 5, 6+ docMainly Named Entity answers (person, place, date, …) From 2002 systems are only allowed to return a single exact answer
TREC 2000 Results (long) Watson Team
Falcon The Falcon system from SMU was by far best performing system at TREC 2000It used NLP and performed deep semantic processing
Question parse Who was the first Russian astronaut to walk in space WP VBD DT JJ NNP NP TO VB IN NN NP NP PP VP S VP S
Question semantic form astronaut walk space Russian first PERSON first(x) astronaut(x) Russian(x) space(z) walk(y, z, x) PERSON(x) Question logic form: Answer type
Parsing in QA Top systems in TREC 2005 perform parsing of queries and answer paragraphsSome use specially built parserParsers are slow: ~ 1min/sentence
Practical Uses of Parsing Google Knowledge Graph enriched from relations extracted from Dependency TreesGoogle index parses all documentsGoogle Translator applies dependency parsing to sentencesSentiment Analysis improves by dependency parsing
Statistical Methods in NLP Some NLP problems: Information extractionNamed entities, Relationships between entities, etc. Finding linguistic structurePart-of-speech tagging, Chunking, Parsing Can be cast as learning mapping:Strings to hidden state sequences NE extraction, POS tagging Strings to strings Machine translation Strings to trees Parsing Strings to relational data structures Information extraction
Techniques Log-linear (Maximum Entropy) taggersProbabilistic context-free grammars (PCFGs)Discriminative methods:Conditional MRFs, Perceptron, Kernel methods
Learning mapping Strings to hidden state sequencesNE extraction, POS taggingStrings to stringsMachine translationStrings to treesParsing Strings to relational data structuresInformation extraction
POS as Tagging INPUT:Profits soared at Boeing Co., easily topping forecasts on Wall Street.OUTPUT:Profits /N soared/V at/P Boeing/N Co./N , /, easily /ADV topping /V forecasts /N on /P Wall /N Street/N ./.
NE as Tagging INPUT:Profits soared at Boeing Co., easily topping forecasts on Wall Street.OUTPUT:Profits /O soared/O at/O Boeing/BC Co./IC , /O easily /O topping /O forecasts /O on/NA Wall /BL Street /IL ./O
Parsing Technology
Constituent Parsing
Constituent Parsing Requires Phrase Structure GrammarCFG, PCFG, Unification GrammarProduces phrase structure parse tree Rolls-Royce Inc. said it expects its sales to remain steady ADJP VP NP S VP S NP VP NP VP
Statistical Parsers Probabilistic Generative Model of Language which include parse structure (e.g. Collins 1997)Learning consists in estimating the parameters of the model with simple likelihood based techniquesConditional parsing models (Charniak 2000; McDonald 2005)
Results Method Accuracy PCFGs (Charniak 97) 73.0% Conditional Models – Decision Trees ( Magerman 95) 84.2% Lexical Dependencies (Collins 96) 85.5% Conditional Models – Logistic (Ratnaparkhi 97) 86.9% Generative Lexicalized Model (Charniak 97) 86.7% Generative Lexicalized Model (Collins 97) 88.2% Logistic-inspired Model (Charniak 99) 89.6% Boosting (Collins 2000) 89.8%
Linear Models for Parsing and Tagging Three components:GEN is a function from a string to a set of candidatesF maps a candidate to a feature vector W is a parameter vector
Component 1: GEN GEN enumerates a set of candidates for a sentenceShe announced a program to promote safety in trucks and vans GEN
Examples of GEN A context-free grammarA finite-state machineTop N most probable analyses from a probabilistic grammar
Component 2: F F maps a candidate to a feature vector Rd F defines the representation of a candidate <1, 0, 2, 0, 0, 15, 5> F
Feature A “feature” is a function on a structure, e.g.,h(x) = Number of times is seen in x Feature vector: A set of functions h 1 … h d define a feature vector F ( x ) = <h 1(x), h2(x) … hd(x)> A B C
Component 3: WW is a parameter vector Rd F • W map a candidate to a real-valued score
Putting it all together X is set of sentences, Y is set of possible outputs (e.g. trees) Need to learn a function F : X → Y GEN, F , W define Choose the highest scoring tree as the most plausible structure
Dependency Parsing
Dependency Parsing Produces dependency treesWord-word dependency relationsEasier to understand and to annotate than constituent trees Rolls-Royce Inc. said it expects its sales to remain steady SUBJ OBJ MOD SUBJ OBJ SUBJ MOD TO
Data-Driven Dependency Parsing Graph BasedConsider possible dependency graphsDefine score and select graph with highest scoreTransition BasedDefine a transition system that leads to a parse tree while analyzing a sentence one word at a time
Transition-based Shift-Reduce Parsing Left He PP saw VVD a DT girl NN with IN a DT telescope NN . SENT next top Shift Right
Interactive Simulator Play with it at:http://medialab.di.unipi.it/Project/QA/Parser/sim.html
Shift/Reduce Dependency Parser Traditional statistical parsers are trained directly on the task of tagging a sentenceInstead an SR Parser is trained and learns the sequence of parse actions required to build the parse tree
Grammar Not Required A traditional parser requires a grammar for generating candidate treesAn inductive parser needs no grammar
Parsing as Classification Inductive dependency parsingParsing based on Shift/Reduce actionsLearn from annotated corpus which action to perform at each step
Dependency Graph Let R = {r1, … , rm} be the set of permissible dependency types A dependency graph for a string of words W = w 1 … w n is a labeled directed graph D = ( W, A), where (a) W is the set of nodes, i.e. word tokens in the input string,(b) A is a set of labeled arcs (wi, wj, r),wi, wj W, r R,(c) wj W, there is at most one arc(wi, wj, r) A.
Parser State The parser state is a quadruple S, B, A , whereS is a stack of partially processed tokens B is a buffer of (remaining) input tokens A is the arc relation for the dependency graph ( h , d , r ) A represents an arc h −r→ d, tagged with dependency r
Transition Systems Set of rules used by a transition parser:Nivre’s Arc StandardNivre’s Arc eagerAttardi’s non projectiveNivre’s swap
Arc Standard Shift S , n | B , A S | n , B, ALeft-arcr S|s, n|B, A S, n|B, A {(n, s, r)}Right-arcr S|s, n|B, AS, s|B, A{(s, n, r)}
Parser Algorithm The parsing algorithm is fully deterministic and works as follows:Input Sentence: (w 1, w2 , … , wn) S = <> B = < w 1 , w 2, … , wn> while B != <> do x = getContext(S, B) y = selectAction(model, x) performAction (y, S, B)
Oracle An algorithm that given the gold tree for a sentence, produces a sequence of actions that a parser may use to obtain that gold tree from the input sentenceUsed during training
Arc Standard Oracle Emulate the parserInput Sentence: (w1, w2, … , wn) S = <> B = < w 1 , w 2 , … , w n > while B != <> do if B[0] −r → S[0] and all children of S[0] are attached perform Left-Arcr else if S[0] −r→ B[0] and all children of B[0] are attached perform Right-Arcr else perform Shift
Projectivity An arc wi→wk is projective iff j, i < j < k or i > j > k, w i →* wjA dependency tree is projective iff every arc is projectiveIntuitively: arcs can be drawn on a plane without intersections
Non Projectivity I saw a girl yesterday wearing a ring
Non Projectivity Většinu těchto přístrojů lze take používat nejen jako fax , ale Addressed by special actions: Right 2 , Left 2 Right 3 , Left 3
Arc Standard Properties Does not deal with non-projectivityEvery transition sequence produces a projective dependency tree (soundness)Every projective tree is produced by some transition sequence (completeness)Parsing n words requires 2n transitions
Arc Eager Transitions Shift S , n | B , A S | n , B, A Left-arcrS|s, n |B, Ano (k, s, l) A S , n|B, A{(n, s, r)} Right-arcr S|s, n|B, A S|s|n, B, A {(s, n, r)} ReduceS| s, B, Aif (k, s, r) A S, B, A
Arc Eager Oracle Emulate the parserInput Sentence: (w1, w2 , … , wn) S = <> B = < w 1 , w 2 , … , w n> while B != <> do if B[0] r→ S[0] perform Left-Arcr else if S[0] r→ B[0] perform Right-Arcr else if all children of S[0] are attached and S[0] is attached perform Reduce else perform Shift
Arc Eager Parsing Action Stack Buffer [] They told him a story Shift They told him a story LA-subj told him a story Shift told him a story RA-obj told hima storyReducetold a storyShifttold astoryLA-dettoldstoryRA-objtold storyReducetold
Non Projective Transitions
Actions for non-projective arcs (Attardi) Right2 r S | s 2 | s 1 , n | B , A S|s1, n|B, A{(s2, r, n)} Left2rS|s2|s 1, n|B, A S|s2, s1|B, A{( n, r, s2)}Right3r S|s3|s2|s1 , n|B, A S|s2|s1, n|B, A {(s3, r, n)} Left3rS|s3|s2|s 1, n|B, A S|s3|s2, s1|B, A {(n, r, s3)}
Example Right2 (nejen ← ale)Left3 (Většinu → fax) Většinu těchto přístrojů lze take používat nejen jako fax , ale
Examples zou gemaakt moeten worden in zou moeten worden gemaakt in Extract followed by Insert
Example Right2 (nejen → ale) Left3 (fax → Většinu) Většinu těchto přístrojů lze take používat nejen jako fax , ale
Example Většinu lze používat nejen fax ale jako , těchto přístrojů take Right2
Example Většinu lze používat fax ale jako , těchto přístrojů take nejen Left3
Example Většinu lze používat jako těchto přístrojů take nejen ale , fax
Effectiveness for Non-Projectivity Training data for Czech contains 28081 non-projective relations26346 (93%) can be handled by Left2/Right21683 (6%) by Left3/Right3
Non-Projective Accuracy language total DeSR MaltParser Czech 104 77 79 Slovene 88 34 21 Portuguese 54 26 24 Danish 35 10 9
Alternative: swap Shift S , n | B , A S | n , B, A Left-arcrS| s, n|B, Ano ( k, s, l) A S, n|B, A{(n, s, r)}Right-arcr S|s, n|B, A S, s |I, A{(s, n, r)} Swap S|s, n|B, A s < n S|n, s|I, A
Arc Standard Swap Oracle Emulate the parserInput Sentence: (w1, w 2, … , wn) S = <> B = < w 1 , w 2 , … , w n> while B != <> do if B[0] −r→ S[0] and all children of S[0] are attached perform Left-Arcr else if S[0] −r→ B[0] and all children of B[0] are attached perform Right-Arcr else if S[0] < B[0] according to Inorder traversal perform Swap else perform Shift
Example Inorder traversal:A hearing on the issue is scheduled today 1 2 5 6 7 3 4 8
Parsing with Swap Action Stack Buffer [] A hearing is scheduled on the issue today Shift A hearing is scheduled on the issue today Shift A hearing is scheduled on the issue today LA-ATT hearing Is scheduled on the issue today Shifthearing isscheduled on the issue today Shifthearing is scheduledon the issue todayShifthearing is scheduled onthe issue todaySwaphearing is onscheduled the issue todaySwaphearing on Is scheduled the issue todayShifthearing on isscheduled the issue todayShifthearing on is scheduledthe issue todayShifthearing on is scheduled theIssue todaySwap hearing on is thescheduled issue todaySwaphearing on theIs scheduled issue today
Learning Phase
Features Feature ID Value F form of token L lemma of token P part of speech (POS) tag M morphology /F form of the leftmost child node /L lemma of the leftmost child node /P POS tag of the leftmost child node, if present M\ Morphology of the rightmost child node F\ form of the rightmost child node L\ lemma of the rightmost child node P\POS tag of the rightmost child node, if present M\Morphology of the rightmost child node
Learning Event leggi NOM le DET anti ADV che PRO , PUN Serbia NOM erano VER discusse ADJ che PRO Sosteneva VER context left context target nodes right context (-3, F, che ), (-3, P, PRO), (-2, F, leggi ), (-2, P, NOM), (-2, M, P), (-2, /F, le), (-2, /P, DET), (-2, /M, P), (-1, F, anti), (-1, P, ADV), (0, F, Serbia), (0, P, NOM), (0, M, S), (+1, F, che), ( +1, P, PRO), (+1, F\, erano), (+1, P\, VER), (+1, M\, P), (+2, F, ,), (+2, P, PUN)
DeSR (Dependency Shift Reduce)Multilanguage statistical transition based dependency parserLinear algorithmCapable of handling non-projectivityTrained on 28 languagesAvailable from: http://desr.sourceforge.net/
Parser Architecture Modular learners architecture:MLP, MaxEntropy, SVM, PerceptronFeatures can be configured
Available Classifiers Maximum EntropyFast, not very accurateSVMSlow, very accurateMultilayer Perceptron Fast, very accurateDeep LearningWord embeddings as features
- d Update Input Layer Output Layer Destinations Perceptron: Activation functions: Learning: The simplest ANN: Perceptron Slide by Geoffrey Hinton D 0 Input Layer Output Layer Destinations D 1 Input Layer Output Layer Destinations D 2
Multilayer Perceptron input vector hidden layers outputs Back-propagate error signal to get derivatives for learning Compare outputs with correct answer to get error signal Slide by Geoffrey Hinton
Feature Model LEMMA -2 -1 0 1 2 3 prev(0) leftChild(-1) leftChild(0) rightChild(-1) rightChild(0) POSTAG -2 -1 0 1 2 3 next(-1) leftChild(-1) leftChild(0) rightChild(-1) rightChild(0) CPOSTAG -1 0 1 FEATS -1 0 1 DEPREL leftChild (-1) leftChild (0) rightChild (-1) Notation used in DeSR configuration files
2nd, 3rd Order Features 2nd CPOSTAG(-1) CPOSTAG(0) 2nd CPOSTAG(0) CPOSTAG(1) 2nd LEMMA(0) POSTAG(leftChild(0)) 3rd POSTAG( leftChild (0)) LEMMA(0) POSTAG( rightChild (0))
CoNLL-X Shared Task To assign labeled dependency structures for a range of languages by means of a fully automatic dependency parserInput: tokenized and tagged sentencesTags: token, lemma, POS, morpho features, ref. to head, dependency labelFor each token, the parser must output its head and the corresponding dependency relation
CoNLL-X: Data Format N WORD LEMMA CPOS POS FEATS HEAD DEPREL PHEAD PDEPREL 1 A o art art < artd >|F|S 2 >N _ _ 2 direcção direcção n n F|S 4 SUBJ _ _ 3 já já adv adv _ 4 ADVL _ _ 4 mostrou mostrar v v-fin PS|3S|IND 0 STA _ _5 boa_vontade boa_vontade n n F|S 4 ACC _ _6 , , punc punc _ 4 PUNC _ _ 7 mas mas conj conj-c <co-vfin>|<co-fmc> 4 CO _ _8 a o art art <artd>|F|S 9 >N _ _9 greve greve n n F|S 10 SUBJ _ _10 prossegue prosseguir v v-fin PR|3S|IND 4 CJT _ _ 11 em em prp prp _ 10 ADVL _ _12 todas_as todo_o pron pron-det <quant>|F|P 13 >N _ _13 delegações delegaçõo n n F|P 11 P< _ _ 14 de de prp prp <sam-> 13 N< _ _15 o o art art <-sam>|<artd>|M|S 16 >N _ _16 país país n n M|S 14 P< _ _17 . . punc punc _ 4 PUNC _ _
CoNLL: Evaluation Metrics Unlabeled Attachment Score (UAS)proportion of tokens that are assigned the correct headLabeled Attachment Score (LAS) proportion of tokens that are assigned both the correct head and the correct dependency relation label Context Word Labeled Attachment Score (CLAS) Like LAS but disregards attachments of punctuations and function words, i.e. determiners ( det ), classifiers ( clf ), adpositions (case), auxiliaries (aux, cop), and conjunctions (cc, mark)
Annotation Issues A dependency graph D = (W, A) is a directed rooted treeD is (weakly) connected:If i , j ∈ V, i ↔ ∗ j D is acyclic : If i → j, then not j →∗ i D obeys the single-head constraint:If i → j, then not i′ → j, for any i′ ≠ iThe single-head constraint causes problems in handling certain linguistic phenomena
Anomalous Cases Il governo garantirà sussidi a coloro che cercheranno lavoro He designs and develops programs
Solution Il governo garantirà sussidi a coloro che cercheranno lavoro He designs and develops programs PREDREL SUBJ SUBJ OBJ OBJ
Evalita 2009 Results Corpus DeSR Best Turin TreeBank 88.67 88.73 ISST 83.38 83.38 Evaluation of Italian linguistic tools for parsing, POS tagging, NER tagging
Evalita 2014 Results Metric LAS UAS Parser accuracy 87.89 90.16 Metric Prec. Rec. F1 Relation Accuracy 81.89 90.45 85.95
Problem with Oracles Only suggest correct pathIf a parser makes mistakes, it finds itself in a state never seen in training and does not know how to recoverThis causes error propagation
Spurious ambiguities Two possible parsing sequences:SH LA RA SH RA SH LA RE RA RE RASH LA RA SH RA RE SH LA RA RE RA
Error Propagation Standard Oracle:SH LA RA SH SH LA SH SH Errors: 3 Dynamic Oracle: SH LA RA SH LA LA RA RE RA Errors: 1
Dynamic Oracle Allows more than one transition sequenceMakes optimal predictions in all configurationsi.e. does not introduce any further errorsSee: Y, Goldberg, J. Nivre. 2012. A Dynamic Oracle for Arc-Eager Dependency Parsing. Coling 2012. www.aclweb.org/anthology/C12-1059
Dependency Parser using Neural Networks Chen & Manning. A fast and accurate dependency parser using NN. EMNLP 2014.
State Representation Extract a set of tokens from Stack/Buffer Concatenate their vector embeddings Word POS dep S1 S2 B1 lc (S1) rc (S1) lc (S2) rc(S2)goodhascontrol00He0JJVBZNN00PRP 000000nsubj0Embeddings express similaritiesPOS: NN similar to NNSdeps: amod similar num
Accuracy UAS Parser Penn TB Chinese TB Sent/sec Standard 89.9 82.7 Malt 90.1 82.4 470 MST 92.0 83.0 10 NN92.083.9650
Further Developments Improvements:Bigger, deeper networks with better tuned hyperparametersBeam searchBidirectional LSTM
Graph-based Dependency Parsing
Graph-based Parsing For input sentence x define a graph Gx = (V x, Ax), whereV x = {0, 1, …, n}Ax = {(i, j, k) | i, j V and k L} Key observation: Valid dependency trees for x = directed spanning trees of G x Score of dependency tree T by the score of its arcs: Learning: Scoring function s(i, j, k) for each arc (i, j, k)Inference:Search for maximum spanning tree T of Gx given s()
Finding Maximum Spanning Tree Basic idea: choose the arc with highest score from each node Not a tree!
Chu-Liu-Edmonds If not a tree, identify cycle and contractRecalculate arc weights into and out-of cycle O(n2) complexity for non-projective trees
Neural Network Graph Parser Biaffine Attention Model (Dozat&Manning)http://aclweb.org/anthology/K18-2016
NN graph based parser Revived graph-based dependency parsing in a neural worldDesign a biaffine scoring model for neural dependency parsingUses a neural sequence modelGreat results! But slower than simple neural transition-based parsersThere are n2 possible dependencies in a sentence of length n Method UAS LAS (PTB 3.3) Chen & Manning 2014 92.0 89.7 Weiss et al. 2015 93.99 92.05 Andor et al. 2016 94.6192.79Dozat & Manning 201795.7494.08
Architecture Overview Bidirectional LSTM over word/tag embeddings Word POS
Parser Deps Arcs Two separate FC (Fully Connected) ReLU layers: One representing each token as a dependent trying to determine its label One representing each token as a head trying to determine its dependents' labels Two separate FC ReLU layers One representing each token as a dependent trying to find (attend to) its head One representing each token as a head trying to find (be attended to by) its dependents
Self Attention Biaffine self-attention layer to score possible heads for each dependent (n n) matrix score s i = H ( arc-head ) ( W h i ( arc-dep) + b)H(arc-head) = [h1(arc-head)|…| hn(arc-head)]Train with cross-entropyApply a spanning tree algorithm at inference time
Biaffine layer to score possible relations for each best-head/dependent pair ( n c ) Train with softmax cross-entropy, added to the loss of the unlabeled parser Note : this is just a linear model with interaction effects! label.scores ~ head.state * dep.stateClassifier for Labels
Universal Dependencies SoTA 2017 TreeBank LAS CLAS System ru 92.60 90.11 Stanford hi 91.59 87.92 Stanford sl 91.51 88.98 Stanfordpt91.3687.48Stanford ja91.1383.18TRLca90.7086.70Stanford it90.6886.18Stanfordcs90.4388.31 Stanfordpl90.3287.94Stanfordcs90.17 88.44Stanford
Dependencies encode relational structure Relation Extraction with Universal Dependencies
Dependency paths identify relations Example: protein interaction [Erkan et al. EMNLP 07, Fundel et al. 2007] KaiC ← nsubj interacts prep_with → SasA KaiC ←nsubj interacts prep_with→ SasA conj_and→ KaiAKaiC ←nsubj interacts prep_with→ SasA conj_and→ KaiB demonstrated results KaiC interacts rythmically nsubj The compl det ccomp that nsubj KaiB KaiA SasA conj_and conj_and advmod prep_with slide by C. Manning
Universal Dependencies [de Marneffe et al. LREC 2006] Joint international effort to standardize dependency annotation for many [>70] languages universaldependencies.org jumped boy over the the little prep nsubj det amod pobj fence det slide by C. Manning
Graph modification to facilitate semantic analysis Bell, based in LA, makes and distributes electronic and computer products. makes and nsubj dobj products computer conj cc and electronic amod Bell in prep partmod based pobj LA cc conj distributes slide by C. Manning
Triple notation nsubj(makes-8, Bell-1)nsubj(distributes-10, Bell-1) partmod(Bell-1, based-3)nn (Angeles-6, Los-5)prep in(based-3, Angeles-6) conj and(makes-8, distributes-10) amod (products-16, electronic-11) conj and(electronic-11, computer-13) amod (products-16, computer-13) conj and(electronic-11, building-15) amod (products-16, building-15)dobj(makes-8, products-16)
Graph modification to facilitate semantic analysis Bell, based in LA, makes and distributes electronic and computer products. makes nsubj dobj products computer conj_and electronic amod Bell prep_in partmod based LA conj_and distributes amod nsubj slide by C. Manning
BioNLP Relation extraction shared tasks [Björne et al. 2009] slide by C. Manning
Input: Universal Dependencies slide by H. Poon involvement up-regulation IL-10 human monocyte prep_in nn prep_by gp41 p70(S6)-kinase activation prep_in prep_of nn Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocyte by gp41 …
Joint Predictions up-regulation IL-10 prep_in nn prep_by gp41 prep_in prep_of nn involvement Trigger word? Event type? Trigger word? Event type? activation Trigger word? Event type? Trigger word? Event type? human monocyte Trigger word? Event type? Trigger word? Event type? p70(S6)-kinase Trigger word? Event type? slide by H. Poon
References G. Attardi, Experiments with a Multilanguage Non-Projective Dependency Parser, Proc. of the Tenth Conference on Natural Language Learning, New York, (NY), 2006. G. Attardi, F. Dell'Orletta. Reverse Revision and Linear Tree Combination for Dependency Parsing. Proc. of NAACL HLT 2009, 2009.G. Attardi, F. Dell'Orletta , M. Simi, J. Turian . Accurate Dependency Parsing with a Stacked Multilayer Perceptron . Proc. of Workshop Evalita 2009 , ISBN 978-88-903581-1-1, 2009. H. Yamada, Y. Matsumoto. 2003. Statistical Dependency Analysis with Support Vector Machines. In Proc. IWPT.M. T. Kromann . 2001. Optimality parsing and local cost functions in discontinuous grammars. In Proc. FG-MOL.
References D. Cer, M. de Marneffe, D. Jurafsky and C. Manning, Parsing to Stanford Dependencies: Trade-offs between speed and accuracy, In Proc. of LREC-10. 2010