/
Fine-grained prediction of syntactic typology :  Discovering latent Fine-grained prediction of syntactic typology :  Discovering latent

Fine-grained prediction of syntactic typology : Discovering latent - PowerPoint Presentation

sherrill-nordquist
sherrill-nordquist . @sherrill-nordquist
Follow
343 views
Uploaded On 2019-11-02

Fine-grained prediction of syntactic typology : Discovering latent - PPT Presentation

Finegrained prediction of syntactic typology Discovering latent structure with supervised learning Dingquan Wang and Jason Eisner 1 Grammar Induction is Broken This work tries to fix it Starting with syntactic ID: 762428

verb noun nsubj adp noun verb adp nsubj pron dobj det case typology corpus aux amod syntactic propn caviar

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Fine-grained prediction of syntactic typ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Fine-grained prediction of syntactic typology: Discovering latent structure with supervised learning Dingquan Wang and Jason Eisner 1

Grammar Induction is Broken!This work tries to fix it!Starting with syntactic typology inductionJust do supervised learning!Unsupervised methods (like EM) Only locally optimalHard to harness linguistic knowledge & conventions Unusable performance in practice 2

Turn a Corpus into a Parser3 Papa ate with the caviar a spoon . Mama ate with the caviar a spoon . Papa ate caviar a spoon . Mama ate a spoon . … Corpus: u Input sentence: x Spoon ate a caviar Tree: y Spoon ate a caviar

Turn a Corpus into a Parser4 Yer amos yjja Ajjx aat orrr . Per anni inn se in hahh wee . Con per aat Ajjx “tat “yue han . Per anni inn se in hahh wee .… Corpus: u Input sentence: x er amos yjja Ajjx aat orr Tree: y Yer amos yjja Ajjx aat orr

Turn a Corpus into a Parser5 我早上吃了饭。 我中午吃了饭。 明天踢足球。 早上好! … Corpus: u Input sentence: x 愿 原力 与 你 同在 Tree: y 愿 原力 与 你 同在

Previous Work: Grammar Induction 6 S → NP VP VP → VP PP … Hand-crafted CFG rules Optimization (EM) u Corpus

Previous Work: Grammar Induction 7 Optimization (EM) u S → NP VP VP → VP PP … 0.9 0.2 Probabilistic-CFG Parser CKY Non-Convex Objective (e.g., MAP) Hard! Hard! x Input sentence Tree y Might not even model syntax Corpus

Generalization8 u Corpus Parser x Input sentence Tree y S → NP VP VP → VP PP … 0.9 0.2 Sufficient Statistic Map ping:    

Previous Work: Grammar Induction 9 u Corpus Parser x Input sentence Tree y S → NP VP VP → VP PP … 0.9 0.2 Sufficient Statistic Map ping:    

Previous Work: Grammar Induction 10 Optimization (EM) u Parser x y S → NP VP VP → VP PP … 0.9 0.2 Sufficient Statistic   Corpus Input sentence Tree

Our Proposal11 u Parser x y S → NP VP VP → VP PP … 0.9 0.2 Sufficient Statistic Learned Mapping   Corpus Input sentence Tree Optimization (EM)

Our Proposal12 u Parser x y S → NP VP VP → VP PP … 0.9 0.2 Sufficient Statistic Learned Mapping   L earned with supervision! Corpus Input sentence Tree Optimization (EM)

Prediction of Syntactic Typology 13 u S → NP VP VP → VP PP … 0.9 0.2 Learned Mapping p (u) := Syntactic typology p := A learned function parameterized by     Corpus

Syntactic TypologyA set of word order facts of a language14

Syntactic Typology (of English)15Subject-Verb-Object 15 Papa ate a red apple at home nsubj dobj nsubj nsubj N N V V dobj dobj N N V V

Syntactic Typology (of English)16Subject-Verb-Object 16 nsubj nsubj dobj dobj ✘ ✔ ✔ ✘ Papa ate a red apple at home nsubj dobj amod case N N V V N N V V Prepositional case case ✘ ✔ ADP ADP N N Adj -Noun amod amod ✘ ✔ A A N N

Fine-grained Syntactic Typology (of English)17Subject-Verb-Object 17 nsubj nsubj dobj dobj Prepositional case case ✘ ✔ Adj -Noun amod amod ✘ ✔ ✘ ✔ ✔ ✘ N N V V N N V V ADP ADP N N A A N N

Fine-grained Syntactic Typology (of English)18Subject-Verb-Object 18 nsubj nsubj dobj dobj 0.04 0.96 0.96 0.04 Prepositional case case 0.04 0.96 Adj -Noun amod amod 0.03 0.97 N N V V ADP ADP N N A N N N V V A N

Fine-grained Syntactic Typology (of English)19Subject-Verb-Object 19 nsubj dobj Prepositional case Adj -Noun amod nsubj dobj case amod … 0.04 0.96 0.04 0.03 … Vector of length 57 0.04 0.96 0.04 0.03 N V N V ADP N A N

Fine-grained Syntactic Typology (of Japanese)20 Subject-Object-Verb 20 nsubj dobj Postpositional case Adj -Noun amod 0.0 0.0 1.0 0.0 nsubj dobj case amod … 0.0 0.0 1.0 0.0 … Vector of length 57 N V N V ADP N A N

Fine-grained Syntactic Typology (of Hindi)21 Subject-Object-Verb 21 nsubj dobj Postpositional case Adj -Noun amod 0.01 0.25 0.98 0.03 nsubj dobj case amod … 0.01 0.25 0.98 0.03 … Vector of length 57 N V N V ADP N A N

Fine-grained Syntactic Typology (of French)22Subject-Verb-Object 22 nsubj dobj Prepositional case Noun- Adj amod 0.03 0.76 0.01 0.73 nsubj dobj case amod … Vector of length 57 0.03 0.76 0.01 0.73 … N V N V ADP N A N

Fine-grained Syntactic Typology23 nsubj dobj case amod … 0.04 0.96 0.04 0.03 … 0.0 0.0 1.0 0.0 … 0.01 0.25 0.98 0.03 … 0.03 0.76 0.01 0.73 … Typology Japanese Hindi English French Language

The Task24 nsubj dobj case amod … Papa ate with the caviar a spoon . Mama ate with the caviar a spoon . … パパはキャビアとスプーンを 食。 ママはキャビアとスプーンを 食。 … पापा ने कैवियार के साथ एक चम्मच माँ कैवियार एक चम्मच के साथ … Papa a mangé au caviar cuillère . Maman a mangé au caviar cuillère . … Typology Corpus: u 0.04 0.960.04 0.03… 0.0 0.0 1.0 0.0 … 0.01 0.25 0.98 0.03 … 0.03 0.76 0.01 0.73 …

Challenge 1: Difference of Lexicon25 must You feel around you the Force 必须 你 感受 周围 的 到 你 原力 devez Vous ressentir autour de la Force vous अपने आपको आसपास महसूस करना फोर्स को चाहिए は あなた あなた を感じ なけれ の周りの 力 ばなりません უნდა თქვენ იგრძნოთ ძალა ირგვლივ ave jin sekke m'orvikoon verven anni

Delexicalization26 must You feel around you the Force 必须 你 感受 周围 的 到 你 原力 devez Vous ressentir autour de la Force vous अपने आपको आसपास महसूस करना फोर्स को चाहिए は あなた あなた を感じ なけれ の周りの 力 ばなりません უნდა თქვენ იგრძნოთ ძალა ირგვლივ ave jin sekke m'orvikoon verven anni AUX PRON VERB ADP PRON DET PROPN AUX PRON VERB ADP PROPN PART PRON PART AUX PRON VERB ADP PROPN DET PRON PART PRON PRON ADP VERB PART PROPN AUX AUX AUX PRON PRON ADP PROPN PART VERB AUX PRON VERB ADP PROPN DET AUX VERB ADP PROPN DET ADP PROPN VERB PRON PRON PART AUX AUX PRON Let’s cheat! ( like other grammar induction systems )

The Task27 nsubj dobj case amod … Papa ate with the caviar a spoon . Mama ate with the caviar a spoon . … パパはキャビアとスプーンを 食。 ママはキャビアとスプーンを 食。 … पापा ने कैवियार के साथ एक चम्मच माँ कैवियार एक चम्मच के साथ … Papa a mangé au caviar cuillère . Maman a mangé au caviar cuillère . … Typology 0.04 0.96 0.040.03… 0.0 0.0 1.0 0.0 … 0.01 0.25 0.98 0.03 … 0.03 0.76 0.01 0.73 … Corpus: u

The Task (Delex)28 nsubj dobj case amod … NOUN VERB ADP NOUN PUNCT NOUN VERB PART NOUN PUNCT … NOUN DET NOUN VERB PUNCT NOUN NOUN VERB PART … NOUN AUX NOUN ADP PUNCT AUX NOUN NUM NOUN VERB … NOUN VERB ADP NOUN PUNCT NOUN VERB NOUN PUNCT… Typology Corpus of tags: ũ 0.040.960.04 0.03… 0.0 0.0 1.0 0.0 … 0.01 0.25 0.98 0.03 … 0.03 0.76 0.01 0.73 …

Intuition29 NOUN VERB DET ADJ NOUN ADP NOUN NOUN VERB PART NOUN DET ADJ NOUN VERB PRON VERB ADP DET NOUN … nsubj nsubj N N V V

Surface Cues to Structure30 NOUN VERB DET ADJ NOUN ADP NOUN NOUN VERB PART NOUN DET ADJ NOUN VERB PRON VERB ADP DET NOUN … Cues! nsubj nsubj N N V V Triggers for  Principles & Parameters

Surface Cues to Structure31 NOUN VERB DET ADJ NOUN ADP NOUN NOUN VERB PART NOUN DET ADJ NOUN VERB PRON VERB ADP DET NOUN … Cues! case case ADP ADP V V Triggers for  Principles & Parameters

Surface Cues to Structure32 NOUN VERB DET ADJ NOUN ADP NOUN NOUN VERB PART NOUN DET ADJ NOUN VERB PRON VERB ADP DET NOUN … Cues! amod amod A A N N Triggers for  Principles & Parameters

Surface Cues to Structure33 NOUN DET ADJ NOUN VERB ADP NOUN NOUN NOUN VERB DET ADJ NOUN VERB PRON ADP DET NOUN VERB … Cues! dobj dobj N N V V Triggers for  Principles & Parameters

Prediction of Syntactic Typology 34 ũ S → NP VP VP → VP PP … 0.9 0.2 p ( ũ ) := Syntactic typology p := A learned function parameterized by   Corpus of tags How to estimate ?    

Supervised Training35 . .. . . . ũ True Typology POS-corpus ( , ) PRON AUX … VERB PROPN … … Language 1 ( , ) VERB NOUN … NOUN DET…NOUN ADJ … …Language 2 Vector of length 57 Directionality( ) =  

Supervised TrainingCould use a convex objective (in principle)Allows feature-rich discriminative modelImitates how linguists annotated the training languagesTrained system is like a human baby (we hope)Knows the surface cues to deep structure (cf. Chomsky)In contrast to standard unsupervised learners, which have only a few hyperparameters to tuneIf we have enough training languages 36

DataUniversal Dependencies version 1.2A collection of dependency treebanks for 37 languages37 UD: 20 languages Train Test cs , es , fr, hi, de, it, la itt, no, ar , pt en , nl, da, fi, got, grc, et, la proiel, grc proiel, bgla, hr, ga, he, hu, fa, ta, cu, el, ro, sl, ja ktc, sv, fi ftb, id, eu, pl

Supervised TrainingCould use a convex objective (in principle)Allows feature-rich discriminative modelImitates how linguists annotated the training languagesTrained system is like a human baby (we hope)Knows the surface cues to deep structure (cf. Chomsky)In contrast to standard unsupervised learners, which have only a few hyperparameters to tuneIf we have enough training languages38 ???

Challenge 2: Data SparsityEach language gives only ONE training example!39 . . . . . . ũ Typology ( , ) PRON AUX … VERB PROPN … … Language 1 ( , ) VERB NOUN … NOUN DET… NOUN ADJ …… Language 2

Wang and Eisner (2016)More than 50,000 synthetic languages!Resemble real languages, but not found on EarthWe call it the Galactic Dependencies Treebanks 40 nsubj nmod must AUX You PRON feel VERB around ADP you PRON aux the DET Force PROPN det dobj case

Wang and Eisner (2016)More than 50,000 synthetic languages!Resemble real languages, but not found on EarthWe call it the Galactic Dependencies Treebanks 41 nsubj nmod must AUX You PRON feel VERB around ADP you PRON aux the DET Force PROPN det dobj case

Wang and Eisner (2016)More than 50,000 synthetic languages!Resemble real languages, but not found on EarthWe call it the Galactic Dependencies Treebanks 42 nsubj nmod must AUX You PRON feel VERB you PRON aux the DET Force PROPN det dobj ADP around case

DataUniversal Dependencies version 1.2A collection of dependency treebanks for 37 languagesGalactic Dependencies version 1.0 43 UD: 20 languages + GD: about 8000 languages by mix-and-match Train Test cs , es , fr , hi, de, it, la itt , no, ar, pt en, nl, da, fi, got, grc, et, la proiel , grc proiel, bg la, hr, ga, he, hu, fa, ta, cu, el, ro , sl, ja ktc, sv, fi ftb, id, eu, pl

Prediction of Syntactic Typology 44 ũ Corpus of tags Wang and Eisner (2016) p ( ũ ) := Syntactic typology p := A learned function parameterized by   S → NP VP VP → VP PP … 0.9 0.2  

Architecture45 PRON AUX … VERB PROPN … … Corpus of tags ( ũ ) p (ũ)   Sigmoid Surface-form feature  

Surface-form Feature (Neural)46 . . . ũ . . . . . . . . . . . . ũ 1 ũ 2 ũ K . . . . . . . . . . . . NOUN VERB NOUN . . . AUX VERB PUNCT PRON ADV VERB . . . VERB PUNCT NOUN AUX VERB . . . ADJ NOUN PUNCT

47 nmod : Head Noun -> Nominal Modifier True Directionality Predicted Directionality

dobj: Head Verb -> Direct Object48 True Directionality Predicted Directionality

nsubj: Head Verb -> Subject49 True Directionality Predicted Directionality

case: Head Noun -> Adposition50 True Directionality Predicted Directionality

case (Trained on 20 Real Languages)51 True Directionality Predicted Directionality

Evaluationε-insensitive loss52 0 ε ε 1.0 1.0 True (p*) Prediction(p̂) No Loss Loss Loss - ε ε p*- p ̂ Loss

53Simple Baselines H euristic B ase R ate Compared to Grammar Induction S tate-of-the- a rt Dependency G rammar Induction S ystems MS13 N10 S upervised T raining on n Languages 20 8000 Doesn’t even look at the corpus!

Summary: Training the System5420 langs hi en fr . . . p ermute ~8000 langs en~hi@N~fr@V . . . en~ fr @N~ hi @V hi ~ fr @N~ en @V . . . t ypology pl POS corpus   p rediction c ount d iscard trees train treebanks

Summary and Future WorkOld: standalone “good” analysis (max likelihood)New: learn how linguists analyze (mimic them!)Find surface cues that predict deeper structure Future workUse our predicted syntactic typology for grammar induction and parsing Predict syntactic typology from raw word sequencesLearning universal word representations 55

Thanks!56