Finegrained prediction of syntactic typology Discovering latent structure with supervised learning Dingquan Wang and Jason Eisner 1 Grammar Induction is Broken This work tries to fix it Starting with syntactic ID: 762428
Download Presentation The PPT/PDF document "Fine-grained prediction of syntactic typ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Fine-grained prediction of syntactic typology: Discovering latent structure with supervised learning Dingquan Wang and Jason Eisner 1
Grammar Induction is Broken!This work tries to fix it!Starting with syntactic typology inductionJust do supervised learning!Unsupervised methods (like EM) Only locally optimalHard to harness linguistic knowledge & conventions Unusable performance in practice 2
Turn a Corpus into a Parser3 Papa ate with the caviar a spoon . Mama ate with the caviar a spoon . Papa ate caviar a spoon . Mama ate a spoon . … Corpus: u Input sentence: x Spoon ate a caviar Tree: y Spoon ate a caviar
Turn a Corpus into a Parser4 Yer amos yjja Ajjx aat orrr . Per anni inn se in hahh wee . Con per aat Ajjx “tat “yue han . Per anni inn se in hahh wee .… Corpus: u Input sentence: x er amos yjja Ajjx aat orr Tree: y Yer amos yjja Ajjx aat orr
Turn a Corpus into a Parser5 我早上吃了饭。 我中午吃了饭。 明天踢足球。 早上好! … Corpus: u Input sentence: x 愿 原力 与 你 同在 Tree: y 愿 原力 与 你 同在
Previous Work: Grammar Induction 6 S → NP VP VP → VP PP … Hand-crafted CFG rules Optimization (EM) u Corpus
Previous Work: Grammar Induction 7 Optimization (EM) u S → NP VP VP → VP PP … 0.9 0.2 Probabilistic-CFG Parser CKY Non-Convex Objective (e.g., MAP) Hard! Hard! x Input sentence Tree y Might not even model syntax Corpus
Generalization8 u Corpus Parser x Input sentence Tree y S → NP VP VP → VP PP … 0.9 0.2 Sufficient Statistic Map ping:
Previous Work: Grammar Induction 9 u Corpus Parser x Input sentence Tree y S → NP VP VP → VP PP … 0.9 0.2 Sufficient Statistic Map ping:
Previous Work: Grammar Induction 10 Optimization (EM) u Parser x y S → NP VP VP → VP PP … 0.9 0.2 Sufficient Statistic Corpus Input sentence Tree
Our Proposal11 u Parser x y S → NP VP VP → VP PP … 0.9 0.2 Sufficient Statistic Learned Mapping Corpus Input sentence Tree Optimization (EM)
Our Proposal12 u Parser x y S → NP VP VP → VP PP … 0.9 0.2 Sufficient Statistic Learned Mapping L earned with supervision! Corpus Input sentence Tree Optimization (EM)
Prediction of Syntactic Typology 13 u S → NP VP VP → VP PP … 0.9 0.2 Learned Mapping p (u) := Syntactic typology p := A learned function parameterized by Corpus
Syntactic TypologyA set of word order facts of a language14
Syntactic Typology (of English)15Subject-Verb-Object 15 Papa ate a red apple at home nsubj dobj nsubj nsubj N N V V dobj dobj N N V V
Syntactic Typology (of English)16Subject-Verb-Object 16 nsubj nsubj dobj dobj ✘ ✔ ✔ ✘ Papa ate a red apple at home nsubj dobj amod case N N V V N N V V Prepositional case case ✘ ✔ ADP ADP N N Adj -Noun amod amod ✘ ✔ A A N N
Fine-grained Syntactic Typology (of English)17Subject-Verb-Object 17 nsubj nsubj dobj dobj Prepositional case case ✘ ✔ Adj -Noun amod amod ✘ ✔ ✘ ✔ ✔ ✘ N N V V N N V V ADP ADP N N A A N N
Fine-grained Syntactic Typology (of English)18Subject-Verb-Object 18 nsubj nsubj dobj dobj 0.04 0.96 0.96 0.04 Prepositional case case 0.04 0.96 Adj -Noun amod amod 0.03 0.97 N N V V ADP ADP N N A N N N V V A N
Fine-grained Syntactic Typology (of English)19Subject-Verb-Object 19 nsubj dobj Prepositional case Adj -Noun amod nsubj dobj case amod … 0.04 0.96 0.04 0.03 … Vector of length 57 0.04 0.96 0.04 0.03 N V N V ADP N A N
Fine-grained Syntactic Typology (of Japanese)20 Subject-Object-Verb 20 nsubj dobj Postpositional case Adj -Noun amod 0.0 0.0 1.0 0.0 nsubj dobj case amod … 0.0 0.0 1.0 0.0 … Vector of length 57 N V N V ADP N A N
Fine-grained Syntactic Typology (of Hindi)21 Subject-Object-Verb 21 nsubj dobj Postpositional case Adj -Noun amod 0.01 0.25 0.98 0.03 nsubj dobj case amod … 0.01 0.25 0.98 0.03 … Vector of length 57 N V N V ADP N A N
Fine-grained Syntactic Typology (of French)22Subject-Verb-Object 22 nsubj dobj Prepositional case Noun- Adj amod 0.03 0.76 0.01 0.73 nsubj dobj case amod … Vector of length 57 0.03 0.76 0.01 0.73 … N V N V ADP N A N
Fine-grained Syntactic Typology23 nsubj dobj case amod … 0.04 0.96 0.04 0.03 … 0.0 0.0 1.0 0.0 … 0.01 0.25 0.98 0.03 … 0.03 0.76 0.01 0.73 … Typology Japanese Hindi English French Language
The Task24 nsubj dobj case amod … Papa ate with the caviar a spoon . Mama ate with the caviar a spoon . … パパはキャビアとスプーンを 食。 ママはキャビアとスプーンを 食。 … पापा ने कैवियार के साथ एक चम्मच माँ कैवियार एक चम्मच के साथ … Papa a mangé au caviar cuillère . Maman a mangé au caviar cuillère . … Typology Corpus: u 0.04 0.960.04 0.03… 0.0 0.0 1.0 0.0 … 0.01 0.25 0.98 0.03 … 0.03 0.76 0.01 0.73 …
Challenge 1: Difference of Lexicon25 must You feel around you the Force 必须 你 感受 周围 的 到 你 原力 devez Vous ressentir autour de la Force vous अपने आपको आसपास महसूस करना फोर्स को चाहिए は あなた あなた を感じ なけれ の周りの 力 ばなりません უნდა თქვენ იგრძნოთ ძალა ირგვლივ ave jin sekke m'orvikoon verven anni
Delexicalization26 must You feel around you the Force 必须 你 感受 周围 的 到 你 原力 devez Vous ressentir autour de la Force vous अपने आपको आसपास महसूस करना फोर्स को चाहिए は あなた あなた を感じ なけれ の周りの 力 ばなりません უნდა თქვენ იგრძნოთ ძალა ირგვლივ ave jin sekke m'orvikoon verven anni AUX PRON VERB ADP PRON DET PROPN AUX PRON VERB ADP PROPN PART PRON PART AUX PRON VERB ADP PROPN DET PRON PART PRON PRON ADP VERB PART PROPN AUX AUX AUX PRON PRON ADP PROPN PART VERB AUX PRON VERB ADP PROPN DET AUX VERB ADP PROPN DET ADP PROPN VERB PRON PRON PART AUX AUX PRON Let’s cheat! ( like other grammar induction systems )
The Task27 nsubj dobj case amod … Papa ate with the caviar a spoon . Mama ate with the caviar a spoon . … パパはキャビアとスプーンを 食。 ママはキャビアとスプーンを 食。 … पापा ने कैवियार के साथ एक चम्मच माँ कैवियार एक चम्मच के साथ … Papa a mangé au caviar cuillère . Maman a mangé au caviar cuillère . … Typology 0.04 0.96 0.040.03… 0.0 0.0 1.0 0.0 … 0.01 0.25 0.98 0.03 … 0.03 0.76 0.01 0.73 … Corpus: u
The Task (Delex)28 nsubj dobj case amod … NOUN VERB ADP NOUN PUNCT NOUN VERB PART NOUN PUNCT … NOUN DET NOUN VERB PUNCT NOUN NOUN VERB PART … NOUN AUX NOUN ADP PUNCT AUX NOUN NUM NOUN VERB … NOUN VERB ADP NOUN PUNCT NOUN VERB NOUN PUNCT… Typology Corpus of tags: ũ 0.040.960.04 0.03… 0.0 0.0 1.0 0.0 … 0.01 0.25 0.98 0.03 … 0.03 0.76 0.01 0.73 …
Intuition29 NOUN VERB DET ADJ NOUN ADP NOUN NOUN VERB PART NOUN DET ADJ NOUN VERB PRON VERB ADP DET NOUN … nsubj nsubj N N V V
Surface Cues to Structure30 NOUN VERB DET ADJ NOUN ADP NOUN NOUN VERB PART NOUN DET ADJ NOUN VERB PRON VERB ADP DET NOUN … Cues! nsubj nsubj N N V V Triggers for Principles & Parameters
Surface Cues to Structure31 NOUN VERB DET ADJ NOUN ADP NOUN NOUN VERB PART NOUN DET ADJ NOUN VERB PRON VERB ADP DET NOUN … Cues! case case ADP ADP V V Triggers for Principles & Parameters
Surface Cues to Structure32 NOUN VERB DET ADJ NOUN ADP NOUN NOUN VERB PART NOUN DET ADJ NOUN VERB PRON VERB ADP DET NOUN … Cues! amod amod A A N N Triggers for Principles & Parameters
Surface Cues to Structure33 NOUN DET ADJ NOUN VERB ADP NOUN NOUN NOUN VERB DET ADJ NOUN VERB PRON ADP DET NOUN VERB … Cues! dobj dobj N N V V Triggers for Principles & Parameters
Prediction of Syntactic Typology 34 ũ S → NP VP VP → VP PP … 0.9 0.2 p ( ũ ) := Syntactic typology p := A learned function parameterized by Corpus of tags How to estimate ?
Supervised Training35 . .. . . . ũ True Typology POS-corpus ( , ) PRON AUX … VERB PROPN … … Language 1 ( , ) VERB NOUN … NOUN DET…NOUN ADJ … …Language 2 Vector of length 57 Directionality( ) =
Supervised TrainingCould use a convex objective (in principle)Allows feature-rich discriminative modelImitates how linguists annotated the training languagesTrained system is like a human baby (we hope)Knows the surface cues to deep structure (cf. Chomsky)In contrast to standard unsupervised learners, which have only a few hyperparameters to tuneIf we have enough training languages 36
DataUniversal Dependencies version 1.2A collection of dependency treebanks for 37 languages37 UD: 20 languages Train Test cs , es , fr, hi, de, it, la itt, no, ar , pt en , nl, da, fi, got, grc, et, la proiel, grc proiel, bgla, hr, ga, he, hu, fa, ta, cu, el, ro, sl, ja ktc, sv, fi ftb, id, eu, pl
Supervised TrainingCould use a convex objective (in principle)Allows feature-rich discriminative modelImitates how linguists annotated the training languagesTrained system is like a human baby (we hope)Knows the surface cues to deep structure (cf. Chomsky)In contrast to standard unsupervised learners, which have only a few hyperparameters to tuneIf we have enough training languages38 ???
Challenge 2: Data SparsityEach language gives only ONE training example!39 . . . . . . ũ Typology ( , ) PRON AUX … VERB PROPN … … Language 1 ( , ) VERB NOUN … NOUN DET… NOUN ADJ …… Language 2
Wang and Eisner (2016)More than 50,000 synthetic languages!Resemble real languages, but not found on EarthWe call it the Galactic Dependencies Treebanks 40 nsubj nmod must AUX You PRON feel VERB around ADP you PRON aux the DET Force PROPN det dobj case
Wang and Eisner (2016)More than 50,000 synthetic languages!Resemble real languages, but not found on EarthWe call it the Galactic Dependencies Treebanks 41 nsubj nmod must AUX You PRON feel VERB around ADP you PRON aux the DET Force PROPN det dobj case
Wang and Eisner (2016)More than 50,000 synthetic languages!Resemble real languages, but not found on EarthWe call it the Galactic Dependencies Treebanks 42 nsubj nmod must AUX You PRON feel VERB you PRON aux the DET Force PROPN det dobj ADP around case
DataUniversal Dependencies version 1.2A collection of dependency treebanks for 37 languagesGalactic Dependencies version 1.0 43 UD: 20 languages + GD: about 8000 languages by mix-and-match Train Test cs , es , fr , hi, de, it, la itt , no, ar, pt en, nl, da, fi, got, grc, et, la proiel , grc proiel, bg la, hr, ga, he, hu, fa, ta, cu, el, ro , sl, ja ktc, sv, fi ftb, id, eu, pl
Prediction of Syntactic Typology 44 ũ Corpus of tags Wang and Eisner (2016) p ( ũ ) := Syntactic typology p := A learned function parameterized by S → NP VP VP → VP PP … 0.9 0.2
Architecture45 PRON AUX … VERB PROPN … … Corpus of tags ( ũ ) p (ũ) Sigmoid Surface-form feature
Surface-form Feature (Neural)46 . . . ũ . . . . . . . . . . . . ũ 1 ũ 2 ũ K . . . . . . . . . . . . NOUN VERB NOUN . . . AUX VERB PUNCT PRON ADV VERB . . . VERB PUNCT NOUN AUX VERB . . . ADJ NOUN PUNCT
47 nmod : Head Noun -> Nominal Modifier True Directionality Predicted Directionality
dobj: Head Verb -> Direct Object48 True Directionality Predicted Directionality
nsubj: Head Verb -> Subject49 True Directionality Predicted Directionality
case: Head Noun -> Adposition50 True Directionality Predicted Directionality
case (Trained on 20 Real Languages)51 True Directionality Predicted Directionality
Evaluationε-insensitive loss52 0 ε ε 1.0 1.0 True (p*) Prediction(p̂) No Loss Loss Loss - ε ε p*- p ̂ Loss
53Simple Baselines H euristic B ase R ate Compared to Grammar Induction S tate-of-the- a rt Dependency G rammar Induction S ystems MS13 N10 S upervised T raining on n Languages 20 8000 Doesn’t even look at the corpus!
Summary: Training the System5420 langs hi en fr . . . p ermute ~8000 langs en~hi@N~fr@V . . . en~ fr @N~ hi @V hi ~ fr @N~ en @V . . . t ypology pl POS corpus p rediction c ount d iscard trees train treebanks
Summary and Future WorkOld: standalone “good” analysis (max likelihood)New: learn how linguists analyze (mimic them!)Find surface cues that predict deeper structure Future workUse our predicted syntactic typology for grammar induction and parsing Predict syntactic typology from raw word sequencesLearning universal word representations 55
Thanks!56